Now that GPT-4 is here, what sets it apart from OpenAI's GPT-3.5

Now that GPT-4 is here, what sets it apart from OpenAI’s GPT-3.5Source: Shutterstock

Open AI’s GPT-4 is more reliable and creative than its predecessor

  • OpenAI said GPT-4 is more reliable and creative, and can handle more nuanced instructions than GPT-3.5.
  • The latest language model scores 40% higher than GPT-3.5 on OpenAI’s internal adversarial factuality evaluations.

For most of 2022, San Francisco artificial intelligence company, OpenAI, was working towards releasing GPT-4, a new AI model that was stunningly good at writing essays, solving complex coding problems, and more. GPT-4 was almost ready when, to the surprise of many, OpenAI decided to shelf its launch and instead, update an unreleased chatbot that used a souped-up version of GPT-3, the company’s previous language model released in 2020.

That was when a chatbot with GPT-3.5 was launched, and they called it ChatGPT. The generative AI chatbot quickly became a global phenomenon. It has been embraced, with mixed results, by people from all walks of life. Although users have complained that ChatGPT is prone to giving biased or incorrect answers, OpenAI indirectly set off a feeding frenzy of investors trying to get in on the next wave of the AI boom.

Less than three months since ChatGPT was first unveiled, OpenAI finally announced the GPT-4, the latest version of its primary large language model, which it has been working on for the last 12 months or more. “A year ago, we trained GPT-3.5 as a first “test run” of the system. We found and fixed some bugs and improved our theoretical foundations,” OpenAI said in a blog post on Tuesday.

As a result, OpenAI’s GPT-4 training run was “unprecedentedly stable,” becoming its first large model whose training performance OpenAI could accurately predict ahead of time. Interestingly, according to Microsoft’s head of consumer marketing Yusuf Medhi in a blog post, those who had begun using the new Bing in preview in the last six weeks would have had an early look at the power of the latest model.

GPT-4 vs. GPT-3.5

OpenAI said GPT-4 is a large multimodal model (accepting image and text inputs, giving text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. GPT-4 even passed a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. 

“We’ve spent six months iteratively aligning GPT-4 using lessons from our adversarial testing program and ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails,” OpenAI said. The company also reckons that the distinction between GPT-3.5 and GPT-4 can be subtle in a casual conversation. 

“The difference comes out when the complexity of the task reaches a sufficient threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5,” it added.

However, OpenAI also warned that despite its capabilities, GPT-4 has similar limitations as earlier GPT models. “Most importantly, it still is not fully reliable,” the blog post reads. GPT-4 still “hallucinates” facts and makes reasoning errors. “While still a real issue, GPT-4 significantly reduces hallucinations relative to previous models. GPT-4 scores 40% higher than our latest GPT-3.5 on our internal adversarial factuality evaluations,” Open AI said.

GPT-4 lacks knowledge of events after most of its data cuts off (September 2021) and does not learn from its experience. OpenAI could sometimes make simple reasoning errors that do not comport with competence across many domains or be overly gullible in accepting obvious false statements from a user. 

“And sometimes it can fail at hard problems the same way humans do, such as introducing security vulnerabilities into code it produces,” the company said. “GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake,” OpenAI added.

So far, GPT-4 access is limited for ChatGPT Plus subscribers with a usage cap. The AI company said it would adjust the exact usage cap depending on demand and system performance in practice. “Depending on the traffic patterns we see, we may introduce a new subscription level for higher-volume GPT-4 usage; we also hope to offer some free GPT-4 queries so those without a subscription can try it too,” the company concluded.