OpenAI’s leading models keep making things up — here's why
Hallucinations galore from o3 and o4-mini

OpenAI’s newly released o3 and o4-mini are some of the smartest AI models to ever be released, but they seem to be suffering from one major problem.
Both models are hallucinating. This in itself isn’t out of the ordinary, as most AI models still tend to do this. But these two new versions seem to be hallucinating more than a number of OpenAI’s older models.
Historically, while most new models continue to hallucinate, the risk has reduced with each new release. The potentially larger issue here is that OpenAI doesn’t know why this has happened.
What are hallucinations?
If you’ve used an AI model, you’ve most likely seen it hallucinate. This is when the model produces incorrect or misleading results. That could mean producing incorrect statistics, getting a picture prompt wrong or simply messing up on the prompt given.
If you’ve used an AI model, you’ve most likely seen it hallucinate. This is when the model produces incorrect or misleading results.
This can be a small, non-important issue. For example, if a chatbot is asked to create a poem only using words beginning with "b" and includes the word "tree," that would be a hallucination, albeit a rather low stakes one.
However, if a chatbot was asked for a list of foods that are safe for someone with a gluten intolerance, and it suggests bread rolls, that would be a hallucination with some risk.
What does this mean for the o3 and o4-mini models?
In OpenAI’s technical report for these two models, it was explained that they both underperformed in PersonQA, an evaluation of AI model’s hallucination rates.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
“This is expected, as smaller models have less world knowledge and tend to hallucinate more. However, we also observed some performance differences comparing o1 and o3,” the report states.
“Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims. More research is needed to understand the cause of this result.”
OpenAI’s report found that o3 hallucinated in response to 33% of questions. That is roughly double the hallucination rate of OpenAI’s previous reasoning models.
Both of these models are still fairly new and, now released to the public, they could see drastic improvements in their hallucination rates as testing continues. However, as both models are set up for more complex tasks, this could be problematic going forward.
As mentioned above, hallucinations can be a funny quirk in non-important prompts. However, reasoning models (AI designed to take on more complex tasks) are typically handling more important information.
If this is a pattern that continues with future reasoning models from OpenAI, it could make for a difficult sales pitch, especially for larger companies looking to spend hefty amounts of money to use o3 and o4-mini.
More from Tom's Guide
- I use Gemini every day — here are 7 prompts I can’t live without
- 7 ChatGPT productivity hacks that you probably didn't know about
- I test Gemini for a living — 5 prompts I wish I knew sooner













Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.
Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.
In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.
Alex aims to make the complicated uncomplicated, cutting out the complexities to focus on what is exciting.
When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.
You must confirm your public display name before commenting
Please logout and then login again, you will then be prompted to enter your display name.