ChatGPT-4o vs Google Gemini Live — how the new AI assistants stack up
What you need to know
Google launched a new artificial intelligence product at its Google I/O event on Tuesday — Gemini Live. We all assumed that is what the Gemini Assistant in Android was supposed to do but this is Google and anything goes.
If it wasn’t for the fact it comes just one day after OpenAI’s first consumer product event, I’d ponder over whether Gemini Live was launched to take on ChatGPT Voice. Both are built using native multi-modal AI models and have impressive voice and video capabilities.
Currently in the global AI race the front runners seem to be OpenAI and Google, with the former seemingly cozying up to Apple and the iPhone and the latter in control of Android. Forget AI devices like the Rabbit r1 or the Humane Pin — the short-term winner is the smartphone.
Both ChatGPT Voice and Gemini Live are being integrated into an existing AI product and neither is available today — but how else do these next-generation assistants compare?
How do Gemini Live and ChatGPT 4o compare?
This summer, we’re expanding Gemini’s multimodal capabilities — including the ability to have an in-depth two-way conversation using your voice. This new experience is called Live. #GoogleIO pic.twitter.com/eAZbaO5WKzMay 14, 2024
Google is on the back foot a little when it comes to credibility, especially around showing off live video analysis and voice capabilities. When it announced Gemini Ultra last year it did so with a video of it responding to real-time video — only it wasn’t real-time or video.
However, this time they made a point of making the tech, at least the underlying “Project Astra” aspect of it including speech and video conversation available to try out at I/O.
Both offer a conversational, natural language voice interface, both offer the potential for live video analysis through a smartphone camera and both seem to be fast enough for a truly natural conversation where you can interrupt the AI mid-flow.
However, there are some notable differences. OpenAI’s ChatGPT Voice sounds more natural, can detect and respond to emotion and vocal tones and even adapt in real-time to how you ask it to speak. I didn’t see evidence of that capability from Gemini Live.
The other big difference is around multimodality. Gemini still relies on other models for output including using Imagen 3 for images and Veo for video. GPT-4o is natively multimodal in both directions — the o stands for omni, or in all directions. It creates its own images and sound.
Gemini Live vs GPT-4o: The future of voice assistants
The world seems to be moving towards voice and away from text input. When I first watched the OpenAI announcement my reaction was that this is a paradigm shift in human-computer interface, one as big as the launch of the mouse or the touch screen.
I still hold that view and the fact Google is also launching a native, natural-sounding voice interface further cements that. Even Meta has its MetaAI, a voice bot available in its VR headsets and the Ray-Ban smart glasses.
While the smartphone might be the winner for now, its clear the real form factor for these voice AI models is smart glasses. Available with cameras at eye height and arms to send soundwaves into your ears — they are the perfect AI device.
The question is whether OpenAI moves into hardware, launching its own pair of smart glasses or whether this is the new Siri and will power a future Apple Glasses product. Also, whether Google is really brave enough to resurrect Google Glass.
More from Tom's Guide
- ChatGPT with GPT-4o — I cannot remember the last time I was this blown away by a piece of technology
- Google just answered GPT-4o with Gemini demo that’s conversational and uses video
- OpenAI GPT-4o is now rolling out — here's how to get access
Sign up to get the BEST of Tom's Guide direct to your inbox.
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?