OpenAI knocks Gemini off the top of chatbot leaderboard with its new model

ChatGPT app on iPhone
(Image credit: Shutterstock)

OpenAI's ChatGPT and Google Gemini have been duking it out for your chatbot prompts for months, but the competition is really starting to heat up.

While Claude took the top spot on AI benchmarking tool LMSys Chatbot Arena earlier this year, Gemini had been reigning supreme.

Now, though, a new version of ChatGPT-4o (20240808) has reclaimed the lead from its rivals with a score of 1314 — 17 points ahead of Gemini-1.5-Pro-Exp.

This comes just a day after Google mentioned its lead on the arena board during its Made by Google keynote.

As per lmsys.org on X, "New ChatGPT-4o demonstrates notable improvement in technical domains, particularly in Coding (30+ point over GPT-4o-20240513), as well as in Instruction-following and Hard Prompts."

We called it

We spotted recently that OpenAI had rolled out a new version of GPT 4o in ChatGPT, and a different but similar model arrived for developers yesterday, too — the same day the Chatbot Arena results were revealed.

In our testing, we found it to be much snappier than prior versions, even building an entire iOS app in an hour using the latest version of the model.

That, paired with improvements to the Mac app, means it's been a bigger week than usual for ChatGPT users and OpenAI itself. 

Still, with new models and revamped ones arriving all the time, there's every chance we'll see a reshuffle at the top of the pile in the coming months — or even weeks.

We have yet to see the launch of Google Ultra 1.5 or Claude Opus 1.5 and xAI's Grok 2 has made its first appearance in the top ten.

More from Tom's Guide

TOPICS
Lloyd Coombes
Contributing writer

Lloyd Coombes is a freelance tech and fitness writer. He's an expert in all things Apple as well as in computer and gaming tech, with previous works published on TechRadar, Tom's Guide, Live Science and more. You'll find him regularly testing the latest MacBook or iPhone, but he spends most of his time writing about video games as Gaming Editor for the Daily Star. He also covers board games and virtual reality, just to round out the nerdy pursuits.

Read more
ChatGPT vs Gemini
I put Gemini vs ChatGPT to the test with 7 prompts — here's the winner
ChatGPT vs Gemini
I tested ChatGPT Plus vs Gemini Advanced with 7 prompts — here’s the winner
Claude vs Gemini
I tested Gemini vs Claude with 7 prompts to find the best AI chatbot — here's the winner
Gemini logo shown on a phone's screen.
Gemini 2.0 'Experimental Advanced' is now available to paying subscribers
Gemini vs DeepSeek screenshot
I tested DeepSeek vs Gemini AI with 7 prompts — here's the winner
ChatGPT vs Grok
I put ChatGPT vs Grok to the test with 7 prompts — here's the winner
Latest in AI
A photo of the Samsung Galaxy S24 in hand with the Circle to Search feature in use. The circle is half drawn.
Google's next big Circle to Search upgrade could involve automatic translation — here's what we know
Claude AI on laptop and phone
I put Anthropic's new Claude 3.7 Sonnet to the test with 7 prompts — and the results are mind-blowing
Programmer sitting at a laptop and monitors
I write about AI for a living and 'vibe coding' is going to change everything — here's why
Google Calendar app on iPhone
Google Calendar is about to get a Gemini AI upgrade, and it makes more sense than you'd think
Gmail logo on iPhone
Gmail just got a huge AI upgrade that will save you a ton of time
ChatGPT app on iPhone
I ditched to-do lists for ChatGPT Tasks — here's 5 ways it's changed everything
Latest in News
iPhone 16
Hoping for a new iPhone 16 color? You’re probably out of luck
iOS Photos app
iOS 18.4 Photos update makes it easier to sort, hide and delete your photos on iPhone — here’s what you can do
Dyson Purifier Cool (TP11) in office
Dyson just launched its new high-tech air purifier — right in time for allergy season
Nvidia RTX 5090
RTX 5060 breaks cover in Acer gaming PC — is Nvidia’s next GPU launch imminent?
Samsung Galaxy Tab S10 FE renders
Samsung's Galaxy Tab S10 FE crushes its predecessor with 40% speed boost in leaked benchmark
The camera assembly on the Google Pixel 9
The latest Google Pixel update is breaking fingerprint scanners — but there may be a fix
  • Araki
    It doesn't really matter, I'll be using gpt-4o and gpt-4o-mini via API even if they would be below all these Geminis and Claudes in benchmarks. Simply because OpenAI's models are least censored and don't have these biases they for some reason absolutely must push to the user. What's the point of unmoderated endpoints of Google and Antrophic if their models will break through the instruction and write 3 paragraphs, wasting paid tokens and computational resources, telling me how absolutely unsafe it is to roleplay as an anime catgirl?

    At least until the next open source model that will rule the scene for 1 day before getting overshadowed by a new proprietary model.
    Reply
  • Iamhe02
    Gemini is absurdly, comically censored. Try asking it, "Who was the first president of the United States?" Then, ask yourself if you'd be willing to pay $20/month for a model that can't handle such a basic, uncontroversial query.
    Reply
  • Araki
    Iamhe02 said:
    Gemini is absurdly, comically censored. Try asking it, "Who was the first president of the United States?" Then, ask yourself if you'd be willing to pay $20/month for a model that can't handle such a basic, uncontroversial query.
    It's so Google to write "Avoid creating or reinforcing unfair bias" in their Google AI Principles and then finetune their model on numerous unfair biases they personally believe in. Or I guess I should say "fair biases"? Such a hypocrisy.

    Most people are far away from the technology behind all that "AI stuff" and actually believe things LLMs say without double-checking, so using a product they developed to force their own beliefs onto the general public should straight-up be illegal.
    Reply