Stable Diffusion creator adds video to its generative AI model — here's what it can do
Video AI is coming
StabilityAI, the company behind the Stable Diffusion artificial intelligence image generator has added video to its playbook.
The new model is built on top of its existing image tool and will allow users to any image into a video at the press of a button. Currently, it's only a research preview and not available for commercial use but StabilityAI says this early release is perfect for hobbyists and education purposes.
The terms and conditions ban creators from using it to produce content that passes itself off as a representation of people or events — no deep fakes here.
What can it do?
Like the early versions of Runway’s video generation tools, Stable Video Diffusion (SVD) is image-to-video, so you need a starting image to kick things off. Runway also has a text-to-video function as will Meta’s new Emu Video when it's released. SVD was trained on a dataset of millions of videos and then fine-tuned for accuracy on a smaller selection of labeled clips. The source of the training data is likely a public research library of videos, which also explains the non-commercial license.
The demonstration videos seem to show that it is capable of producing near, but not perfect, photorealistic short video clips at high-definition resolution. The research paper says it can generate 25 frames per second at 576 x 1024.
Is it as good as it sounds?
This version also has several limitations. It can only produce four-second clips in its initial incarnation, although that is the same as Runway.
According to StabilityAI this new model is unable to generate vide clips from a text prompt. It only works when given an image as a starting point. Its bigger issues come from how you might want to use it. For example it might produce very slow camera pans or no motion at all.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
However, it could be adapted in the future to offer 360 views of an object within a video, allowing for full panning. The company is also working on text-to-video versions that would allow users to create a video from a simple line of text.
The goal is likely to license the model to companies for inclusion in other products such as video editors, advertising tools, and even education for teachers to create more interactive lessons.
More from Tom's Guide
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?