OpenAI knocks Gemini off the top of chatbot leaderboard with its new model

ChatGPT app on iPhone
(Image credit: Shutterstock)

OpenAI's ChatGPT and Google Gemini have been duking it out for your chatbot prompts for months, but the competition is really starting to heat up.

While Claude took the top spot on AI benchmarking tool LMSys Chatbot Arena earlier this year, Gemini had been reigning supreme.

Now, though, a new version of ChatGPT-4o (20240808) has reclaimed the lead from its rivals with a score of 1314 — 17 points ahead of Gemini-1.5-Pro-Exp.

This comes just a day after Google mentioned its lead on the arena board during its Made by Google keynote.

As per lmsys.org on X, "New ChatGPT-4o demonstrates notable improvement in technical domains, particularly in Coding (30+ point over GPT-4o-20240513), as well as in Instruction-following and Hard Prompts."

We called it

We spotted recently that OpenAI had rolled out a new version of GPT 4o in ChatGPT, and a different but similar model arrived for developers yesterday, too — the same day the Chatbot Arena results were revealed.

In our testing, we found it to be much snappier than prior versions, even building an entire iOS app in an hour using the latest version of the model.

That, paired with improvements to the Mac app, means it's been a bigger week than usual for ChatGPT users and OpenAI itself. 

Still, with new models and revamped ones arriving all the time, there's every chance we'll see a reshuffle at the top of the pile in the coming months — or even weeks.

We have yet to see the launch of Google Ultra 1.5 or Claude Opus 1.5 and xAI's Grok 2 has made its first appearance in the top ten.

More from Tom's Guide

TOPICS
Lloyd Coombes

A freelance writer from Essex, UK, Lloyd Coombes began writing for Tom's Guide in 2024 having worked on TechRadar, iMore, Live Science and more. A specialist in consumer tech, Lloyd is particularly knowledgeable on Apple products ever since he got his first iPod Mini. Aside from writing about the latest gadgets for Future, he's also a blogger and the Editor in Chief of GGRecon.com. On the rare occasion he’s not writing, you’ll find him spending time with his son, or working hard at the gym. You can find him on Twitter @lloydcoombes.

  • Araki
    It doesn't really matter, I'll be using gpt-4o and gpt-4o-mini via API even if they would be below all these Geminis and Claudes in benchmarks. Simply because OpenAI's models are least censored and don't have these biases they for some reason absolutely must push to the user. What's the point of unmoderated endpoints of Google and Antrophic if their models will break through the instruction and write 3 paragraphs, wasting paid tokens and computational resources, telling me how absolutely unsafe it is to roleplay as an anime catgirl?

    At least until the next open source model that will rule the scene for 1 day before getting overshadowed by a new proprietary model.
    Reply
  • Iamhe02
    Gemini is absurdly, comically censored. Try asking it, "Who was the first president of the United States?" Then, ask yourself if you'd be willing to pay $20/month for a model that can't handle such a basic, uncontroversial query.
    Reply
  • Araki
    Iamhe02 said:
    Gemini is absurdly, comically censored. Try asking it, "Who was the first president of the United States?" Then, ask yourself if you'd be willing to pay $20/month for a model that can't handle such a basic, uncontroversial query.
    It's so Google to write "Avoid creating or reinforcing unfair bias" in their Google AI Principles and then finetune their model on numerous unfair biases they personally believe in. Or I guess I should say "fair biases"? Such a hypocrisy.

    Most people are far away from the technology behind all that "AI stuff" and actually believe things LLMs say without double-checking, so using a product they developed to force their own beliefs onto the general public should straight-up be illegal.
    Reply