Runway’s ‘better and faster’ Gen-3 AI video model is coming in the ‘next few days’
Smallest of a new generation of models
AI video platform Runway will release its Gen-3 model “in the next few days” and it will include “major improvement in fidelity, consistency, and motion over previous generations of models,” while also being considerably faster, the company told Tom’s Guide.
Runway released Gen-2, the first commercially available text-to-video AI model in June last year and since then a revolution in synthetic video has been unleashed on the world. It now competes with the likes of Pika Labs, Haiper, Luma Labs and the yet-to-be-released Sora.
Gen-3 is a major step-change for Runway and the AI video space. It was rebuilt from the ground up using a new generation infrastructure purpose-built for large-scale multimodal training. This new model was trained on image and video at the same time for improved realism.
The public will be able to get access “in the next few days” to an Alpha version. Anastasis Germanidis, Runway CTO and Co-Founder told me this was the smallest of a new generation of frontier AI models coming from the coming as a result of the new training infrastructure.
What makes Runway Gen-3 different?
Runway Gen-3 includes an improved ability to control motion within a video as well as understanding real-world movement and physics. Combined with its photorealism and you’ve got a model that can create videos almost indistinguishable from reality.
There were some surprises for the team when first using Gen-3 after it completed training including its approach to scene creation. This is something possible thanks to a minimum 10-second video creation. The previous generation capped out at about four seconds.
“The ability to create unusual transitions has been one of the most fun and surprising ways we’ve been using Gen-3 Alpha internally,” said Germanidis. He told me: “The model is able to incorporate and make sense of drastic changes in the environment with very pleasing results.”
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Introducing Gen-3 Alpha: Runway’s new base model for video generation.Gen-3 Alpha can create highly detailed videos with complex scene changes, a wide range of cinematic choices, and detailed art directions.https://t.co/YQNE3eqoWf(1/10) pic.twitter.com/VjEG2ocLZ8June 17, 2024
As well as changing the scenes and environment you have much greater degrees of “temporal control” as it was trained with “multiple highly descriptive captions per scene, which makes it capable of generating videos that have unusual and interesting transitions of environment and action, as well as precise key-framing of specific elements in time,” he explained.
“These model improvements paired with existing control modes such as Motion Brush, Advanced Camera Controls, and Director Mode give our users more control than ever before.”
You can start with images, text or even video using Gen-3, whereas Gen-2 doesn’t support video as an input. It doesn’t matter which you use, according to Germanidis. “Gen-3 Alpha improves significantly in terms of temporal consistency and has much-reduced morphing compared to Gen-2 for both text and image inputs.”
Creating a General World Model
Gen-3 Alpha by @runwayml is fantastic, but what's a Generational World Model without a little audio😉🎶Enjoy Runway's demos updated with precise Music & SFX!🚂💨Subtle reflections of a woman on the window of a train moving at hyper-speed in a Japanese city. https://t.co/Iq293vT7N6 pic.twitter.com/6nOIeEjRAqJune 17, 2024
Germanidis told Tom’s Guide this was the “first of the next generation of foundation models trained by Runway from the ground up”. He added that future versions “will reach and exceed the scale of large language models,” such as Google Gemini and Anthropic’s Claude.
In the same way the big AI LLM labs like OpenAI and Anthropic are working towards Artificial General Intelligence (AGI), Runway is working to build “General World Models.”
“A general world model,” explained Germanidis “ is an AI system that builds an internal representation of an environment, and uses it to simulate future events within that environment.”
“The aim of general world models will be to represent and simulate a wide range of situations and interactions, like those encountered in the real world,” he added.
While Gen-3 isn’t in itself an Open World Model it is the first step, Germanidis told me. “It’s still very early, and this is the first and smallest of our upcoming models”.
“The model can struggle with complex character and object interactions, and generations don’t always follow the laws of physics precisely,” he warned. So don’t get overly excited but remember this is just step one.
More from Tom's Guide
- Apple is bringing iPhone Mirroring to macOS Sequoia — here’s what we know
- iOS 18 supported devices: Here are all the compatible iPhones
- Apple Intelligence unveiled — all the new AI features coming to iOS 18, iPadOS 18 and macOS Sequoia
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?