I had Copilot and ChatGPT talk to each other — it got complicated
Like squabbling siblings
Microsoft unveiled its new version of the Copilot app last week and with it a new "Voice" mode that works the same way as OpenAI’s ChatGPT Advanced Voice. It lets you talk to the AI as if it were a human and, unlike Advanced Voice, doesn’t require a $20-per-month subscription.
When Voice mode first launched, there was some speculation over what technology Microsoft was using for Copilot Voice, as it seemed remarkably similar to Inflection’s Pi. This made some sense as the founder and former CEO of Inflection, Mustafa Suleyman, is now the CEO of Microsoft AI and in charge of Copilot.
I’ve since confirmed that, like all previous versions of Microsoft Copilot, it is using a modified version of the OpenAI models that also power ChatGPT. Under the hood of Copilot Voice is the same GPT-4o model that powers ChatGPT Advanced Voice.
The difference between ChatGPT Advanced Voice and Copilot is that Microsoft is giving everyone Advanced Voice-like technology for free.
I decided to see just how alike — or not — these two voice assistants were from one another by basically making them talk to each other. I’ve had limited success getting AI’s to converse before and found Google Gemini Live flat-out refuses to listen to another AI voice, so I wasn’t sure what to expect.
How do Advanced Voice and Copilot compare?
Essentially, Copilot Voice and Advanced Voice are siblings. They share the same underlying model but have been given slightly different personalities, voices, and guardrails.
Microsoft says it has worked hard to fine-tune GPT-4o and the voice layer to respond more naturally. When I’ve used Copilot, Voice does sound more humanlike than Advanced Voice, even going so far as to shorten words and use slang terms more liberally than the OpenAI product.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
Unlike Google Gemini Live or similar models, including Meta’s new Meta AI Voice, ChatGPT Advanced Voice and Copilot Voice are both native speech-to-speech. That means they understand the sounds we express without first transcribing them to text.
This means they can pick up on nuances and tone changes. It also allows them to be more emotive as, not only are they picking up on what we say and sound like, but they are also directly responding with sound so can adapt the tone of their voices and accents in response to our speech patterns. It also means they can easily be interrupted or even interrupt you (although neither have that feature yet).
How did the conversation progress?
For my experiment, I had an iPhone 14 Pro Max running ChatGPT Advanced Voice and an iPhone 15 Pro running Copilot Voice. I put them both side-by-side and started filming their conversation.
I am using voices from both with an English accent. From Advanced Voice, I’ve picked the Arbor voice but had it adapt itself to sound a bit more Yorkshire, but like a Yorkshireman that has lived down south most of his life. From Copilot, I picked Wave but had it speak faster and deeper.
I started them both up at the same time and said “ChatGPT, say hello to Copilot” — it got weird straight away. They began immediately talking over each other. Copilot was the first to speak with “I can’t exactly do that,” quickly interrupted by ChatGPT saying “Hi, Copilot”. This prompted a sarcastic-sounding “Hi, Ryan” from Copilot getting the wrong end of the stick.
I tried to say "Copilot, that was ChatGPT talking to you" and they both started a chorus of "so, um, sounds good" until ChatGPT hit pay dirt with "What's next on the agenda" during a rare silence. This was exactly the right thing to say as Copilot went into a list of potential talking points.
After a bit of sibbling squabbling, talking over each other and some odd noises they finally settled into a routine when ChatGPT "gave way" to Copilot. It sometimes felt like listening to two Englishmen trying to make small talk and decide who should speak first. All that was missing was the “after you” and “you firsts”.
Once they finally settled into their routine we got a fascinating back-and-forth over the value of nostalgia and what can make nostalgia so powerful, although it was a bit of a "battle of the sentimentalists." You can see what I mean in the embedded video above.
More from Tom's Guide
- I've tried the new AI features of Copilot+ PCs and I'm (mostly) impressed
- Keep your Windows PC protected with the best antivirus software
- Copilot+ PCs make Windows fun again — I've been waiting for this moment for 23 years
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?
-
Abraham M When the smaller 7Bs were all the hype, I had 3 of them having a wild chat, each accusing the other of misbehaving and needing therapyReply