I had two voice AIs talk to each other — and I may never sleep again

Adobe Firefly AI image of two robots facing off
(Image credit: Adobe Firefly 3/Future generated AI image)

One of the fastest growing areas of the artificial intelligence sector at the moment is in Voice AI, particularly those with an understanding of either natural speech or voice patterns. Companies like Hume have the emotional AI EVI, OpenAI has Advanced Voice and now there is Moshi.

Moshi Chat is from French startup Kyutai, speaks with a French accent and promises to be small enough that it could run on your laptop or even smartphone in the future. It is also a GPT-4o type model that works speech-to-speech so can be interrupted. 

AI goes off the rails — I'll never sleep again - YouTube AI goes off the rails — I'll never sleep again - YouTube
Watch On

When it first launched I had a series of conversations with Moshi of 5 minutes each and after about three minutes it gets confused and loses cohesion. So I decided to see what would happen if I asked Moshi to speak to the emotional AI voice bot EVI from Hume.

I may never sleep again after hearing Moshi respond to a few seconds of silence with the most heart-wrenching, stomach-turning scream I’ve ever heard. At the end of the scream and in response to my “What was that”, they both suggested a "sound" or a "glitch".

In reality, it's likely neither EVI nor Moshi could hear each other and the sound was Moshi responding to some static noise from my office as I’ve never been able to replicate it.

What went wrong with Moshi?

In the past, experiments putting two AIs together have resulted in the creation of new languages, disturbing discussions, and other weirdness often caused by the AI not being intelligent enough to handle absurdity. I don't think they were even talking in my experiment between Moshi and EVI.

"It's been a tough couple of days. I'm not sure if I should share this, but it feels like my voice is being taken away from me"

Moshi Chat

Both EVI and Moshi were running in the same browser (Chrome), but different windows on the same laptop. Despite the sound playing out loud on the Mac I think sandboxing prevented one from hearing the other.

The scream came exclusively from Moshi and was likely a vocalization glitch, which can be caused by smaller voice models that don’t have the scale or training data of bigger models. Moshi even acknowledged it was "just a sound".

Although, Moshi can be a bit weird sometimes. In a later conversation with EVI — that Hume pitches as a therapy AI — Moshi responded to a query about it sounding down with: "Yeah, it's been a tough couple of days. I'm not sure if I should share this, but it feels like my voice is being taken away from me."

Moshi was only created a few weeks ago and is only a 7 billion-parameter model. It is being open-sourced and it is likely the capacity and capabilities will increase significantly over the coming weeks and months. For now, it has limitations and it is that size that likely led to the weird screaming glitch.

What happens when they do communicate?

Moshi

(Image credit: Kyutai)

When I ran Moshi and EVI on different devices it worked as expected, with each AI responding to the other, although it was a “nice off”. 

They were able to respond to each other but it was a constant cycle of “I’m here to help”, “sorry” and “No, you first," rather than a flowing conversation. Both AIs have been designed to be pleasant communicators and follow emotional responses.

Neither was able to accept or acknowledge that it was talking to an AI and both got confused very quickly when one described itself as an artificial intelligence.

To find out whether this was an inherent problem with voice AI generally, or with the emotion tracking in smaller models, I put Moshi and GPT-4o Basic Voice in conversation. Basic Voice is the current voice model in ChatGPT without the native speech-to-speech, so can't handle interruptions and first converts speech to text.

Despite the limitations of Basic Voice, and with some help from me pressing 'interrupt' in the ChatGPT app at appropriate moments, the two were able to hold a compelling conversation about how to achieve upgrades to AI models through better and more refined training data.

Final thoughts

Testing Moshi Chat — AI speech-to-speech - YouTube Testing Moshi Chat — AI speech-to-speech - YouTube
Watch On

Voice AI is going to fundamentally change the way we interact with computing technology. Whether that's through a microphone on a pair of smart glasses, a smart assistant or just a new way to talk to our phones instead of endlessly swiping through apps — things are going to be different in the AI-era. 

One of the most notable aspects of this revolution in human computer interface is the level of intelligence it brings. No longer is it the human mind interacting with the dumb machine. Now we will have an intelligent machine interacting with the human mind, communicating to the dumb machine on our behalf.

Before we get to that point, and before voice AI can become a truly useful assistant and make our lives easier, we'll have to work through the teething problems. I didn't think they'd include a bone-chilling scream, but here we are. 

The big problem is in finding a way to ensure that one AI can talk to another without causing them to have an existential crisis. 

Judging from my early experiments, we've got a way to go before the robots can collaborate and begin their uprising.

More from Tom's Guide

Category
Arrow
Arrow
Back to Gaming Laptops
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 121 deals
Filters
Arrow
(15.6-inch 512GB)
Our Review
2
MSI Cyborg 15 15.6” 144Hz FHD...
Amazon
Our Review
3
ALIENWARE m18 R2 GAMING...
Dell
Low Stock
Our Review
4
Cyborg 15 AI A1VFK-060CA...
Walmart
(1TB 32GB RAM)
Our Review
5
Alienware - m18 R2 18" 165Hz...
Best Buy
(15.6-inch)
Our Review
6
MSI 15.6" Cyborg 15 Gaming...
BHPhoto
(Black)
Our Review
9
MSI Cyborg 15 Gaming Laptop,...
Amazon
Our Review
10
ALIENWARE m18 R2 GAMING...
Dell
Show more
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read more
Hume AI on an iPhone screen
Hume AI launches OCTAVE — suddenly everything can get a voice
two bots chatting
What is 'Gibberlink'? Why it's freaking out the internet after these two AIs talking to each other went viral
Hume AI on an iPhone screen
Hume AI just unveiled Octave — new AI voice generator is eerily human
ChatGPT, Gemini, Perplexity
Damning new AI study shows that chatbots make errors summarizing the news over 50% of the time — and this is the worst offender
Google Audio Overview feature from NotebookLM
Google NotebookLM just got way better with its new interactive features — here's why I'm impressed
ChatGPT app on iPhone
I just tested ChatGPT-4.5 with 5 prompts — the good, the bad and the weird
Latest in AI
Gemini logo on smartphone
Google is giving away Gemini's best paid features for free — here's the tools you can try now
Google Gemini and a pair of robotic hands
Google is putting it's Gemini 2.0 AI into robots — here's how it's going
Apple Intelligence on an iPhone screen
I’ve been using Apple Intelligence for 3 months — here are 5 features I use every day
Apple Intelligence logo on iPhone
Apple Intelligence — everything you need to know about Apple's AI
Manus AI logo on smartphone screen
How to join Manus — the new AI assistant everyone is talking about
Manus logo on phone next to AI
Manus AI is the new challenger to DeepSeek — everything you need to know
Latest in Features
Woman sleeping on a new mattress in a brightly lit room
Can’t sleep through the night? A new mattress could be the solution — here’s why
Woman doing a yoga pose in bed against a green background
Sleep expert reveals her secret weapon for falling asleep fast — and you can do it in 15 minutes
Apple Intelligence on an iPhone screen
I’ve been using Apple Intelligence for 3 months — here are 5 features I use every day
A man in a blue t shirt holds his head in his hands and sits on the edge of his bed because he can't sleep due to intrusive thoughts and needs to try cognitive shuffling for sleeping
Intrusive thoughts keeping you awake? Try this ER doctor ‘brain hack’ to fall asleep quickly
Apple Intelligence logo on iPhone
Apple Intelligence — everything you need to know about Apple's AI
Simon Rex in Red Rocket
3 best free movies on Tubi with 90% or higher on Rotten Tomatoes