OpenAI’s leading models keep making things up — here's why

(Image credit: Shutterstock)

OpenAI’s newly released o3 and o4-mini are some of the smartest AI models to ever be released, but they seem to be suffering from one major problem.

Both models are hallucinating. This in itself isn’t out of the ordinary, as most AI models still tend to do this. But these two new versions seem to be hallucinating more than a number of OpenAI’s older models.

Historically, while most new models continue to hallucinate, the risk has reduced with each new release. The potentially larger issue here is that OpenAI doesn’t know why this has happened.

What are hallucinations?

If you’ve used an AI model, you’ve most likely seen it hallucinate. This is when the model produces incorrect or misleading results. That could mean producing incorrect statistics, getting a picture prompt wrong or simply messing up on the prompt given.

If you’ve used an AI model, you’ve most likely seen it hallucinate. This is when the model produces incorrect or misleading results.

This can be a small, non-important issue. For example, if a chatbot is asked to create a poem only using words beginning with "b" and includes the word "tree," that would be a hallucination, albeit a rather low stakes one.

However, if a chatbot was asked for a list of foods that are safe for someone with a gluten intolerance, and it suggests bread rolls, that would be a hallucination with some risk.

What does this mean for the o3 and o4-mini models?

OpenAI logo with person in front

(Image credit: Shutterstock)

In OpenAI’s technical report for these two models, it was explained that they both underperformed in PersonQA, an evaluation of AI model’s hallucination rates.

“This is expected, as smaller models have less world knowledge and tend to hallucinate more. However, we also observed some performance differences comparing o1 and o3,” the report states.

“Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims. More research is needed to understand the cause of this result.”

OpenAI’s report found that o3 hallucinated in response to 33% of questions. That is roughly double the hallucination rate of OpenAI’s previous reasoning models.

Both of these models are still fairly new and, now released to the public, they could see drastic improvements in their hallucination rates as testing continues. However, as both models are set up for more complex tasks, this could be problematic going forward.

As mentioned above, hallucinations can be a funny quirk in non-important prompts. However, reasoning models (AI designed to take on more complex tasks) are typically handling more important information.

If this is a pattern that continues with future reasoning models from OpenAI, it could make for a difficult sales pitch, especially for larger companies looking to spend hefty amounts of money to use o3 and o4-mini.

More from Tom's Guide

Back to Laptops

Apple

Asus

Dell

Lenovo

AMD Ryzen

Intel Core i3

Intel Core i5

Intel Core i7

8GB RAM

16GB RAM

32GB

64GB

128GB

256GB

512GB

1TB

13.3-inch

13.4-inch

14-inch

Black

Blue

Gold

Silver

White

New

Refurbished

LED

OLED

EMMC

SSD

Showing 10 of 131 deals

Filters☰

Apple 13" MacBook Air M4 (2025)

(Blue)

$999

$939

View

Lenovo IdeaPad Flex 5i ChromeBook Plus

(14-inch 128GB)

$574.90

$499

View

Apple 13" MacBook Air M4 (2025)

$949

View

Lenovo Yoga Slim 7x (Gen 9)

(Blue)

$1,289.99

$1,099.99

View

Asus ROG Zephyrus G14 (2024)

Our Review

☆☆☆☆☆

$2,199.99

View

Apple 15" MacBook Air M4 (2025)

(Blue)

(Silver)

Apple 15" MacBook Air M4 (2025)

$1,599.95

View

Dell XPS 13 (2016)

Our Review

☆☆☆☆☆

$899.99

$569

View

TOPICS

Alex is the AI editor at TomsGuide. Dialed into all things artificial intelligence in the world right now, he knows the best chatbots, the weirdest AI image generators, and the ins and outs of one of tech’s biggest topics.

Before joining the Tom’s Guide team, Alex worked for the brands TechRadar and BBC Science Focus.

In his time as a journalist, he has covered the latest in AI and robotics, broadband deals, the potential for alien life, the science of being slapped, and just about everything in between.

Alex aims to make the complicated uncomplicated, cutting out the complexities to focus on what is exciting.

When he’s not trying to wrap his head around the latest AI whitepaper, Alex pretends to be a capable runner, cook, and climber.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

What are hallucinations?

What does this mean for the o3 and o4-mini models?

Sign up to get the BEST of Tom's Guide direct to your inbox.

More from Tom's Guide

You must confirm your public display name before commenting