Meet Whisper Web — a new and free way to transcribe audio
Transcribe audio for free and on-device with this new tool
Those searching for the perfect transcription tool to speed up their workflow may want to take note of a new AI tool called Whisper Web. Essentially an in-browser transcription service, it promises accurate on-device processing that could save us all a lot of time.
So far, so unremarkable. But here's where the AI smarts come in. Because the tool has been trained on machine learning, it supports multilingual transcription and translation across 100 different languages. And it's not just limited to your own recorded voice notes. You can input a URL or upload a file to have Whisper Web create a transcription in a matter of seconds.
Launched last week, the tool has been added to the open-source AI platform Hugging Face and is available to use now. So, naturally, I tried it.
It's finally possible: real-time in-browser speech recognition with OpenAI Whisper! 🤯 The model runs fully on-device using Transformers.js and ONNX Runtime Web, and supports multilingual transcription across 100 different languages! 🔥Check out the demo (+ source code)! 👇 pic.twitter.com/W9CSM9zPwBJune 7, 2024
Does it work?
Over the years, I've used my fair share of transcription apps in order to help record interviews, draft emails or just keep track of notes and ideas to myself. To date, one of my favorite offerings is the Recorder app that's exclusive to Google's Pixel phones. Of course, the drawback there is you need to have a Pixel phone to hand.
Whisper Web was able to take a 25-second audio clip from my laptop mic (complete with background noise) and generate a word-perfect transcription in about ten seconds. The resulting text was broken down into snippets and available to export in either TXT or JSON format.
I was markedly impressed with this newfound tool and think it could be a really helpful resource. That is, providing you're speaking in English...
In order to test the multilingual capabilities, I switched to French and recorded a brief, 17-second passage saying that I'd skipped breakfast so will get an early lunch; probably a cheeseburger.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
I'm not a French speaker, but even so I don't think my pronunciation is that far off. So it seems a bit awkward that "Alors je pense que je vais déjeuner tôt" (so I think I'll have an early lunch) got transcribed as "J'ai ton femme et j'aime reste déjeuner totes" (I have your wife and I like to have lunch totes).
You'd need someone with much better language skills than me to really take Whisper Web to task. But given that 99.5% of all my transcription needs happen in English, I'm suitably impressed with this handy free tool and could feasibly start incorporating it into my daily workflow. In all likelihood, starting later today, when I need to transcribe all the AI-heavy news likely to come out of Cupertino at Apple's WWDC keynote event.
More from Tom's Guide
- I got early access to LTX Studio to make AI short films
- I just tried the new Assistive AI video tool — and its realism is incredible
- Meet LTX Studio — I just saw the future of AI video tools that can help create full-length movies
Jeff is UK Editor-in-Chief for Tom’s Guide looking after the day-to-day output of the site’s British contingent. Rising early and heading straight for the coffee machine, Jeff loves nothing more than dialling into the zeitgeist of the day’s tech news.
A tech journalist for over a decade, he’s travelled the world testing any gadget he can get his hands on. Jeff has a keen interest in fitness and wearables as well as the latest tablets and laptops. A lapsed gamer, he fondly remembers the days when problems were solved by taking out the cartridge and blowing away the dust.