In my previous article, “Much ado about something”, I discussed the growing popularity of ChatGPT and the potential issues and limitations associated with it. This week, there have been some significant changes to the application that addresses these issues and opens up new possibilities.
ChatGPT has now been expanded to become a multi-modal language model, allowing users to interact with it through text, image, video, and audio. By being able to interact with users through multiple modalities, it can provide a more personalised experience that can help improve user engagement and satisfaction.
This version has also been expanded to process longer form documents with an input/output capacity of 25,000 words – a significant increase from the 3000 words used in previous versions.
The image processing alone is a great leap forward. With ChatGPT-4, image processing is more than just identifying what’s in the image; it can also interpret and explain it. For example, take a picture of your fridge and ChatGPT-4 might give you a list of dishes you can make with the contents.
OpenAI says that ChatGPT-4 can identify humour in single or multiple images (e.g. a comic strip) and explain why it’s funny. Additionally, there may be potential for understanding a sketch and turning it into something more tangible, such as a website. There are many new applications.
To fine tune this interactivity OpenAI has partnered with Be My Eyes where being able to identify everyday items will assist visually impaired people to be able to gain greater freedom. Very cool.
The ChatGPT-4 Model has another new feature; Plugins. These are extensions that allow ChatGPT to interact with products and services connected to the internet. That would allow it to be up-to-date beyond the cutoff point of its current 2021 training data. The first plugins have been created by Expedia, FiscalNote, Instacart, KAYAK, Klarna, Milo, OpenTable, Shopify, Slack, Speak, Wolfram, and Zapier.
Zapier, for example, enables users to quickly and easily connect multiple applications in one interface. In the future, this may be further streamlined by allowing users to access multiple APIs through a single chat interface, potentially eliminating the need for user interfaces altogether.
OpenAI says “We’re also hosting two plugins ourselves, a web browser and code interpreter. We’ve also open-sourced the code for a knowledge base retrieval plugin, to be self-hosted by any developer with information with which they’d like to augment ChatGPT.”
We’re also hosting two plugins ourselves, a web browser and code interpreter. We’ve also open-sourced the code for a knowledge base retrieval plugin, to be self-hosted by any developer with information with which they’d like to augment ChatGPT. – OpenAI
The ‘code interpreter’ is a sandboxed code execution environment. At the moment it only runs python, though it does provide ChatGPT-4 with the ability to write code that it can then execute and debug. The sandbox includes some space for upload and download.
The ‘retrieval plugin’ allows organisations to upload their own personal company data. Documents, email, code, anything. Using a vector database such as Pinecone the data is then searchable by chat. Companies such as Kayak and Expedia are already using it to drive travel booking applications.
Separating Fact from Fiction
GPT-3 would sometimes hallucinate random information.
GPT-4 is 40% more likely to give a factual response and 82% less likely to provide quote disallowed content than GPT-3. However, OpenAI cautions that this model can still produce biassed results and questionable information. (Source OpenAI)
OpenAI recently announced that its latest GPT-4 model has achieved a marked improvement in performance, passing a simulated bar exam with results in the top 10% after having been in the bottom 10% six months ago. The development of this model is the first time OpenAI has been able to accurately predict GPT-4’s results ahead of time. (Source OpenAI)
OpenAI researchers discovered that GPT-4 was able to successfully solve hindsight neglect decision problems, outperforming its predecessor GPT-3.5 which had a score of almost 0. This was a major breakthrough for the model, demonstrating its improved capabilities in making decisions without the benefit of hindsight. (Source OpenAI)
GPT-4 can also search for current information, providing the source to verify it with a built-in browser to click on links to find additional information.
In addition to ChatGPT and Google’s Bard; Apple, Amazon and Baidu have all expressed interest in creating their own large language models. Furthermore, smaller organisations are exploring the possibility of developing their own language models.
Stanford University worked on a sanctioned Facebook project using their model, LLaMA (which Stanford called Alpaca). Facebook wanted to know why language models generate toxic and false text?
The computer scientists at Stanford then released Alpaca. It is an open-source seven-billion-parameter model developed by fine-tuning LLaMA.
This model was reportedly constructed for less than USD $600. The open source code has gained popularity among developers, who have had success running it on Raspberry Pi computers and even Pixel 6 smartphones. The typical price tag for this is in the millions of US dollars.
The cost of the model is estimated to be around $600, although this was based on another model used for training, so factor that in. The implication is clear. Using existing models as a starting point, it should be possible for other players to create their own models at a relatively low cost.
This type of cost reduction was not expected, this early in the game and equated to 8 years of work reduced to 5 weeks.
Stanford University recently removed the language model Alpaca on March 23rd due to safety and economic considerations, illustrating that even when the development costs are reduced, there remains the cost of operating the model.
Another implication is storage. In a few years maybe we will be able to run localised AI chat on our smartphones. We can already run the ChatGPT model via the web of course, I mean a fully embedded personalised chat assistant on the phone.
We may also observe models that spawn other models, that work individually or together.
Not so ‘Open AI’ anymore
The AI wars have intensified with the emergence of Google’s BARD, PaLM and LaMDA, as well as OpenAI investors such as Microsoft pushing for their products to be released into the wild. Despite this, the full implications and details of these technologies are still being explored and understood by their creators.
How they did this is not so ‘Open AI’ anymore. There is a tonne of information in the release paper here. The company’s commitment to being open for humanity has been overshadowed by a large influx of capital, reducing the amount of information available on how results were achieved. Competition and security were reasons given.
The AI wars are heating up.
The field of battle has moved to office productivity.
Google made a move to incorporate AI into its work applications on March 14th. This victory was short-lived. Microsoft quickly followed with their own AI powered updates to their office applications only days later.
The search engine battle is continuing. It’s clear that Microsoft is going all in. While GPT-4 is only available to subscribers, anyone can access it using Bing.
Microsoft is spending big to disrupt Google’s business.
OpenAI’s ChatGPT has been rapidly advancing, but recently encountered a major setback when a glitch allowed some users to view the titles of other users’ conversations. Some of these chat history images were shared on Reddit and Twitter. This caused an uproar among many users who were concerned about their privacy on the platform. CEO Sam Altman expressed his regret over the incident and assured that the “significant” error had now been fixed. Many users remain concerned about privacy on the platform.
More competition typically benefits the consumer, but in this case it may not. Companies are rapidly integrating new technologies into their solutions without adequate safety measures. GPT-4 was not supposed to be released this soon. New versions may behave in ways that simply cannot be predicted.
The researchers have made it clear that they do not support the release of GPT-4 or Microsoft’s plans to deploy it. This suggests that the launch was pushed forward due to pressure from Microsoft’s executives. An audio recording of John Montgomery, Microsoft’s VP, indicates this; as he said that there is “very, very high” pressure from CTO Kevin Scott and CEO Satya Nadella to quickly get the latest open AI models into consumers’ hands.
Microsoft laid off its entire ethics and society team within the artificial intelligence organisation as part of recent layoffs that affected 10,000 employees across the company.
Microsoft is currently lacking a dedicated team to ensure that its AI principles are properly incorporated into product design, despite the company’s efforts to make AI tools widely available.
Microsoft still maintains an active Office of Responsible AI, which is tasked with creating rules and principles to govern the company’s AI initiatives. The company says its overall investment in responsibility work is increasing despite the recent layoffs.
Does this mean anything to you?
Two MIT economics graduate students, Shakked Noy and Whitney Zhang, conducted a study involving 444 college-educated white collar professionals from fields such as marketing, data-analysis, grant writing and HR. Half of the participants were asked to use ChatGPT in their daily tasks, while the other half did not. Each participant was assigned 20x30m tasks.
The performance of ChatGPT was evaluated by professionals in the chosen areas based on speed, quality, and its ability to replace, augment, or confuse work.
ChatGPT was found to be 37% faster at completing tasks (17 minutes vs. 27 minutes) with comparable quality than when using traditional methods. As workers repeated their tasks for improvement, the ChatGPT group’s quality improved significantly faster, demonstrating that it’s a useful tool for quickly improving work without sacrificing quality. (source MIT 2023, not yet peer reviewed).
Through further research with the participants, it was found that key areas for productivity gains were brainstorming, rough drafts, comparisons, summaries and final editing. However, there is still potential for inaccuracy due to issues such as finding old data or misrepresenting data.
OpenAI, OpenResearch, and the University of Pennsylvania co-authored a paper that examined the impact of AI on work tasks without differentiating between positive or negative outcomes.
The researchers quantified exposure as the potential of a GPT-powered system to reduce the time it takes for a human to perform a specific work task by at least 50%.
Although not a prediction, the research inferred from the data, that approximately 80 percent of the US workforce could be impacted by GPTs in terms of having at least 10 percent of their work tasks affected.
Around 19 percent of workers meanwhile could see at least 50 per cent of their tasks impacted.
Which jobs are most ‘exposed’?
For the study, human experts and the AI separately determined the level of ‘exposure’ for different occupations. Humans labelled 15 occupations in the study as ‘fully exposed’. The language model classified 86 jobs as ‘fully exposed’. Both cases indicate that a language model could be used in those occupations.
- The occupations that the humans found were 100 percent exposed are: Mathematicians, Tax Preparers, Financial Quantitative Analysts, Web and Digital Interface Designers.
- The language models listed the following occupations as 100 per cent exposed: Mathematicians, Accountants and Auditors, News Analysts, Reporters, and Journalists, Legal Secretaries and Administrative Assistants, Clinical Data Managers, Climate Change Policy Analysts
- The language models also found these jobs to be more than 90 per cent exposed: Correspondence Clerks, Blockchain Engineers, Court Reporters and Simultaneous Captioners, Proofreaders and Copy Markers
- Other high percentage occupations listed by the humans were: Survey Researchers (84.4%), Writers and Authors (82.5%) Interpreters and Translators (82.4%), Public Relations Specialists (80.6%), Animal Scientists (77.8%)
Limitations of the study
OpenAI researcher Pamela Mishkin, highlighting some limitations of the study on Twitter presented that.
- The approach relied on labels that can have a different interpretation depending on the researcher. Subjectivity. They would have to understand those occupations to really tell if they were affected.
- GPT-4 is highly responsive to the way prompts are phrased, with even slight changes in wording producing different results. As such, human and LLM prompts must be crafted differently.
- It’s uncertain whether the various duties and responsibilities associated with these occupations can be divided into more specific tasks, potentially excluding certain types of skills or activities essential for doing the job effectively.
Well not quite, though according to the OpenAI researchers, the GPT-4 model is seen as a step closer to achieving Artificial General Intelligence (AGI), which is an AI that has the ability to understand and reason about a wide range of tasks and concepts.
“We demonstrate that beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, fusion, medicine, law, psychology and more without needing any special prompting.” (source OpenAI’s GPT-4 paper).
We demonstrate that beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, fusion, medicine, law, psychology and more without needing any special prompting. – OpenAI’s GPT-4 paper
“Moreover, in all of these tasks, TPT force performance is strikingly close to human level performance and often vastly surpasses prior models such as GPT. We believe that GPT-4’s intelligence signals a true paradigm shift in the field of computer science and beyond. Given the breadth and depth of GPT-4s capabilities, we believe that it could easily be viewed as an early yet still incomplete version of an artificial general intelligence system.” (source OpenAI’s GPT-4 paper).
Moreover, in all of these tasks, TPT force performance is strikingly close to human level performance and often vastly surpasses prior models such as GPT. We believe that GPT-4’s intelligence signals a true paradigm shift in the field of computer science and beyond. Given the breadth and depth of GPT-4’s capabilities, we believe that it could easily be viewed as an early yet still incomplete version of an artificial general intelligence system. – OpenAI’s GPT-4 paper
There’s still a lot of unknowns though….
It’s not that great?
Like everything it’s an unfolding story.
Like search engines before GPT it’s what you put in that makes it better. Some people are really good at searching. Some prompts are better than others. As the models improve, maybe this won’t be as much of a problem, though this is very clear when using image generators such as Dall•E2, MidJourney or StableDiffusion. In other cases you can embrace the randomness and get swept away.
The trick may be:
- Give a specific objective for the output.
- Give a specific format for the answer.
- Give a list of things to avoid.
This may work with people as well.