I just tried Runway’s new AI voiceover tool — and it’s way more natural sounding than I expected
Human-like AI voices are now a reality
Runway, one of the leading artificial intelligence video generation services, has added a new text-to-speech feature to its platform. This allows users to create voiceovers for projects and select from several realistic-sounding, but synthetic voices.
The company was founded in 2018, releasing the first publicly available, commercially licensed video-to-video model early in 2023. Known as Gen-1 it was accessible through Discord and recreated video clips using artificial intelligence.
With Gen-2 came the ability to turn images and text into video and a new web platform. The latest addition is a text-to-voice tool that can create multiple voices.
I tried it out and was genuinely impressed with how natural and varied the voices were. This is the type of advancement actors were concerned about during the recent SAG-AFTRA strike. The realism was surprising.
Creating a voiceover with Runway
Accessing the audio tool isn't particularly obvious. It is under the video menu with the title Generate Audio. I imagine future versions of Runway's editor will include easier access to generate a voiceover. For now, it is a standalone tool.
There are a number of audio services available including removing silence from an existing clip, cleaning up background noise, and of course generating speech from text.
To test how well it works I created a short video made using Gen-2 and with images I recently generated using MidJourney version six. I had ChatGPT write a brief script featuring two characters and used the voiceover tool to turn the script into sound.
Sign up now to get the best Black Friday deals!
Discover the hottest deals, best product picks and the latest tech news from our experts at Tom’s Guide.
How easy is it to use?
Very easy, if a little clunky. Each clip you generate using Runway appears on the right-hand side of the screen. The text input is on the left, as is the selection of voices. It doesn’t have the same ability to clone your own voice or select from a broad library of voices as ElevenLabs does, but the quality is the same.
For this project, I had two characters, a soldier and an officer in the human Martian army as they battle against humans from Earth sent to end the Martian fight for independence.
I was able to enter the words I wanted each character to speak, generate using that voice and have it appear as a playable and downloadable sample on the right. You could also generate all of the lines for a character once then cut it up in an editor later.
What is the sound like?
I found the sound was better than expected. Sometimes AI voice tools struggle with emphasis and emotion. While it wasn’t perfect it did capture pauses in the right places and was considerably more natural than I expected. Especially when paired with video or sound FX.
If this is where we've come in a few months, I'd be very worried if I were a voice actor working in radio or games.
More from Tom's Guide
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?