Google drops new Gemini model and it goes straight to the top of the LLM leaderboard

Google Gemini
(Image credit: Google)

Google is constantly updating Gemini, releasing new versions of its AI model family every few weeks. The latest is so good it went straight to the top of the Imarena Chatbot Arena leaderboard — toppling the latest version of OpenAI's GPT-4o.

Update: Gemini under fire after telling user to die.

Previously known as the LMSys arena, it is a platform that lets AI labs pit their best models against one another in a blind head-to-head. The users vote but don't know which model is which until after they've voted.

The new model from Google DeepMind has the catchy name Gemini-Exp-1114 and has matched the latest version of GPT-4o and exceeded the capabilities of the o1-preview reasoning model from OpenAI.

The top 5 models in the arena are all versions of OpenAI or Google models. The first model on the leaderboard not made by either of those companies is xAI's Grok 2.

The success of this new model comes as Google finally releases a Gemini app for iPhone, which beat the ChatGPT app in our Gemini vs. ChatGPT 7-round face-off.

How well does the new model work?

The latest Gemini model seems to perform particularly well at math and vision tasks, which makes sense as they are areas in which all Gemini models excel.

Gemini-Exp-1114 isn't currently available in the Gemini app or website. You can only access it by signing up for a free Google AI Studio account (the platform aimed at developers wanting to try new ideas).

I'm also not sure whether this is a version of Gemini 1.5 or whether its an early insight into Gemini 2, expected next month. If it is the latter then the improvement over the previous generation might not be as extreme as some expected.

However, it is doing well in technical and creative areas according to benchmarks. This would tie in to the idea its going to be useful for reasoning and managing agents. It first in math, solving hard problems, creative writing and vision.

Unlike other benchmarks the Chatbot Arena is based on human perceptions of performance and output quality, rather than rigid testing against data.

Whether this is just a new version of Gemini 1.5 Pro or an early insight into the capabilities of Gemini 2, its going to be an interesting few months in AI land.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read more
Gemini logo shown on a phone's screen.
Gemini 2.0 'Experimental Advanced' is now available to paying subscribers
Google Gemini
Google Gemini — everything you need to know
Gemini 2
Google unveils Gemini 2.0 Flash Thinking — its answer to OpenAI's o1
Gemini 2
Google Gemini 2.0 is now free for users — here’s how to access it now
ChatGPT vs Gemini
I put Gemini vs ChatGPT to the test with 7 prompts — here's the winner
Gemini Live
Gemini Live major upgrade just revealed by Google
Latest in Google Gemini
A stock photo of a person on their phone looking at a spreadsheet while several graphs are displayed on the laptop in front of them.
Google Sheets just got an AI upgrade that analyzes your data and visualizes it
Gemini logo shown on a phone's screen
Google Gemini can now analyze and summarize documents for free — here's how
Gemini Live
Gemini Live major upgrade just revealed by Google
Gemini 2
Google Gemini 2.0 is now free for users — here’s how to access it now
Gemini 2
My browser tabs were getting out of hand so I let Gemini 2.0 takeover — here's how it went
Claude vs Gemini
I tested Gemini vs Claude with 7 prompts to find the best AI chatbot — here's the winner
Latest in News
Google Chromecast
Google has a fix for broken Chromecasts as long as you didn't factory reset
NYTimes Connections
NYT Connections today hints and answers — Friday, March 14 (#642)
Intel CPU
Intel's Panther Lake appears in public for the first time — what we know about the new chip
OnePlus Pad 2 with keyboard
OnePlus Pad 2 Pro specs leak — this tablet is a beast
Josh Hartnett in Trap
Netflix top 10 movies — here’s the 3 worth watching right now
Gemini logo on smartphone
Google is giving away Gemini's best paid features for free — here's the tools you can try now