OpenAI’s leading models keep making things up — here's why

ChatGPT logo on phone
(Image credit: Shutterstock)

OpenAI’s newly released o3 and o4-mini are some of the smartest AI models to ever be released, but they seem to be suffering from one major problem.

Both models are hallucinating. This in itself isn’t out of the ordinary, as most AI models still tend to do this. But these two new versions seem to be hallucinating more than a number of OpenAI’s older models.

Historically, while most new models continue to hallucinate, the risk has reduced with each new release. The potentially larger issue here is that OpenAI doesn’t know why this has happened.

What are hallucinations?

If you’ve used an AI model, you’ve most likely seen it hallucinate. This is when the model produces incorrect or misleading results. That could mean producing incorrect statistics, getting a picture prompt wrong or simply messing up on the prompt given.

If you’ve used an AI model, you’ve most likely seen it hallucinate. This is when the model produces incorrect or misleading results.

This can be a small, non-important issue. For example, if a chatbot is asked to create a poem only using words beginning with "b" and includes the word "tree," that would be a hallucination, albeit a rather low stakes one.

However, if a chatbot was asked for a list of foods that are safe for someone with a gluten intolerance, and it suggests bread rolls, that would be a hallucination with some risk.

What does this mean for the o3 and o4-mini models?

OpenAI logo with person in front

(Image credit: Shutterstock)

In OpenAI’s technical report for these two models, it was explained that they both underperformed in PersonQA, an evaluation of AI model’s hallucination rates.

“This is expected, as smaller models have less world knowledge and tend to hallucinate more. However, we also observed some performance differences comparing o1 and o3,” the report states.

“Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims. More research is needed to understand the cause of this result.”

OpenAI’s report found that o3 hallucinated in response to 33% of questions. That is roughly double the hallucination rate of OpenAI’s previous reasoning models.

Both of these models are still fairly new and, now released to the public, they could see drastic improvements in their hallucination rates as testing continues. However, as both models are set up for more complex tasks, this could be problematic going forward.

As mentioned above, hallucinations can be a funny quirk in non-important prompts. However, reasoning models (AI designed to take on more complex tasks) are typically handling more important information.

If this is a pattern that continues with future reasoning models from OpenAI, it could make for a difficult sales pitch, especially for larger companies looking to spend hefty amounts of money to use o3 and o4-mini.

More from Tom's Guide

Category
Arrow
Arrow
Back to Laptops
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Condition
Arrow
Screen Type
Arrow
Storage Type
Arrow
Price
Arrow
Any Price
Showing 10 of 131 deals
Filters
Arrow
Show more
TOPICS
Alex Hughes
AI Editor

Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.

Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.

In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.

Alex aims to make the complicated uncomplicated, cutting out the complexities to focus on what is exciting.

When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.