Meet Mochi-1 — the latest free and open-source AI video model
Genmo releases an open-source text-to-video model
The generative AI wars are building to a crescendo as more and more companies release their own models. Generative video seems to be the biggest current battleground and Genmo is taking a different approach.
The company is releasing its Mochi-1 model as a 'research preview', but the new video generation model falls under an Apache 2.0 license which makes it open source and able to be taken apart and put back together again.
That also means Mochi-1 is free to use, and you can try it for yourself over on Genmo's site. The beauty of it being open-source also means it will be available on all the usual generative AI platforms in the future, and one day could run on a good gaming PC.
It is launching into a very competitive market with different services offering a range of capabilities including templates from Haiper, realism from Kling or Hailuo and fun effects from Pika Labs and Dream Machine. Genmo says its focus is bringing state-of-the-art to open-source.
Genmo releases free AI video model
So, why use Genmo's model over any others on offer right now? It all comes down to motion. We spoke to Genmo's CEO Paras Jain, who explained that motion is a key metric when benchmarking models.
"I think fundamentally for a very long time, the only uninteresting video is one which doesn't move. And I felt like a lot of AI video kind of suffered this 'Live Photo effect'", he explains. "I think our historical models had this, that was how the technology had to evolve. But videos about motion, were the most important thing we invested in, above all else."
This initial release is a surprisingly small 10 billion parameter transformer diffusion model that uses a new asynchronous approach to pack more punch into a small package.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Jain said they exclusively trained Mochi-1 on video, rather than the more traditional mixed video, image and text approach. This gave it a better understanding of physics.
The team then worked on ensuring the model could properly understand what people wanted it to make. He told us: "We've invested really, really heavily in prompt adherence as well, just following what you say."
Genmo hopes Mochi-1 can offer 'best-in-class' open-source video generation, but at present, videos are limited to 480p as part of the new research preview launching today.
As Jain mentions, a big focus has been placed on prompt adherence and recognition, too. Genmo benchmarks this with a vision language model as a judge following Open AI's DALL-E 3.
Will you be testing Mochi-1? Let us know. It's certainly entering a crowded landscape, but its open-source nature could see it extend further than some of its rivals.
It isn't even the only open-source AI video model to launch this week. AI company Rhymes dropped Allegro "a small and efficient open-source text-to-video model". It is also available with an Apache license although its 15 frames per second and 720p, rather than the 24 frames per second and 420p of Mochi-1.
Neither model will run on your laptop yet, but as Jain told us, the beauty of open-source is that one day someone will fine-tune it to run on lower powered hardware and we'll be making videos offline.
More from Tom's Guide
A freelance writer from Essex, UK, Lloyd Coombes began writing for Tom's Guide in 2024 having worked on TechRadar, iMore, Live Science and more. A specialist in consumer tech, Lloyd is particularly knowledgeable on Apple products ever since he got his first iPod Mini. Aside from writing about the latest gadgets for Future, he's also a blogger and the Editor in Chief of GGRecon.com. On the rare occasion he’s not writing, you’ll find him spending time with his son, or working hard at the gym. You can find him on Twitter @lloydcoombes.