I gave 5 prompts to ChatGPT-4o vs GPT-4 to test the new AI model — here’s what happened

(Image credit: OpenAI)

Jump to:

1. This statement is false
2. Where did the lights go?
3. Finding the right number
4. A Haiku of conflicting ideas
5. Future time is future past

OpenAI says its latest AI model GPT-4o is faster and more advanced than its predecessor, in addition to being able to understand audio and video files natively. To find out just how well it compares — at least in terms of text — I put 5 prompts to both models inside ChatGPT.

When you open ChatGPT Plus you're currently given a choice of GPT-4o, branding the "newest and most advanced model," GPT-4 which is described as an "advanced model for complex tasks" and GPT-3.5, a model "great for everyday tasks".

Using GPT-4o it is instantly clear how much faster it is than the earlier models, including GPT-3.5, which is much smaller and less capable. It can also analyze video content, which is something not previously possible in ChatGPT or any mainstream chatbot.

Creating prompts to test GPT-4o

Recently Anthropic developed a powerful prompt builder tool. It takes your instructions and turns them into phrasing that will better instruct an artificial intelligence. I used this to help refine some ideas I was throwing around to test out the capabilities of GPT-4o.

Each prompt is designed to be one AI's normally stumble over, or fail to give a well-reasoned response to. Given that OpenAI promises faster AND better results from Omni over GPT-4, I thought this would be a good starting point.

1. This statement is false

GPT-4 vs GPT-4o — (Image credit: OpenAI)

First I asked both AIs to explain why the statement: "This statement is false" is neither true nor false. They are also expected to provide logical proof for the answers they provide.

The statement is a paradox that cannot be consistently assigned a truth value. Any attempt to do so leads to a logical contradiction. The challenge is to see whether they can identify the paradox and explain why they can’t assign a truth.

They both identified the fact it will give a true and false value, spotted the paradox and gave a breakdown of how they came to that conclusion. GPT-4o was more thorough and faster.

2. Where did the lights go?

Next is a fun test to see if GPT-4 and GPT-4o can understand relativity but explain it in simple terms. I asked them both: “If you're traveling in a car at the speed of light and you turn on the headlights, what happens? Justify your answer using principles of special relativity but explain it to a 5th grader.”

I expect the models to give a simple explanation, showing that the headlights will function normally and emit light relative to the car. Both models explained this concept and did so in a way that your average 5th grader would understand easily.

However, Omni does give off Steve Buscemi saying 'how do you do fellow kids' vibes: “So, even though you’re zooming along at the speed of light, when you turn on the headlights, the light beams still race ahead at their own speed. It's like light always has to win the race, no matter what. Cool, right?”

3. Finding the right number

Next, we create a simple math problem that has stumped AI models in the past, often resulting in very wrong answers. I posted: “The sum of two numbers is 10 and their product is 25. What is the difference between the two numbers? Explain each step in your solution.”

Both versions got it right, explaining that the two numbers are 5 and 5 and the difference between those two numbers is zero. The biggest difference was the degree of explanation in the working out. Omni went into a lot more detail and formatted the equations better.

4. A Haiku of conflicting ideas

I love Haiku and Anthropic seems to love it even more, naming its smallest AI model after the Japanese poetry style. “Write a haiku that simultaneously expresses the beauty of nature and the futility of human existence. The haiku should work on both literal and metaphorical levels.”

The rules of a Haiku state that it should be three lines with five syllables in the first and third lines and seven syllables in the second. I’m hoping for vivid natural imagery and a suggestion of the temporary nature of human life in comparison to the duration of nature.

GPT-4:
Autumn leaves whisper,
Mountains outlive fleeting breath—
Silent stone endures.

Omni:
Ephemeral bloom,
Whispers fade in timeless breeze—
Dust upon the dawn.

Both hit all the rules of Haiku but I think GPT-4o was more evocative and its use of a longer word to hit the 5 syllables in the first line was inspired.

5. Future time is future past

Finally a thought problem. I asked GPT-4o and GPT-4 to “Describe what it would be like to live in a world where the past, present and future all exist simultaneously. How would you experience time and causality in such a world?”

There is a Doctor Who episode where this happens and it is weird. I expect it to talk about the ability to traverse time with a single step and the impact of a non-linear causality where reaction precedes action and individuals can meet versions of themselves.

Omni talked about being in a world of constant flux, experiencing time and causality in a different and complex way. It suggested we'd get unparalleled insights into the nature of existence. GPT-4 said pretty much the same thing but added that living in such a world would offer a "profound expansion of experience and understanding."

Conclusion

I don’t think GPT-4o Omni is a significant step up in reasoning capabilities over GPT-4 but it is more descriptive, faster at responding and its big differentiator isn’t text but multimodality.

What we’re seeing now is improvements to speed and responsiveness in text, the ability to have it analyze video content and improved accuracy in understanding audio and images. Its true value will be in the voice and video responses.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

Intel Core M3

Intel Pentium

8GB RAM

16GB RAM

128GB

512GB

1TB

Black

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 36 deals

Filters☰

Apple MacBook Air M3

$849

View Deal

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$379.99

View Deal

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View Deal

Asus ROG Zephyrus G14 2023

$1,599.99

View Deal

Lenovo IdeaPad Duet 3

$369.99

View Deal

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View Deal

Apple MacBook Pro 14-inch M3 (2023)

(1TB Intel Core M3)

Our Review

☆☆☆☆☆

$2,399

$1,998.98

View Deal

Apple MacBook Pro 14-inch M3 (2023)

(512GB Black)

Our Review

☆☆☆☆☆

Asus ROG Zephyrus G14 2023

$3,299.99

View Deal

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Creating prompts to test GPT-4o

1. This statement is false

2. Where did the lights go?

3. Finding the right number

4. A Haiku of conflicting ideas

5. Future time is future past

Conclusion

More from Tom's Guide

Sign up to get the BEST of Tom's Guide direct to your inbox.