StabilityAI reveals Stable Diffusion 3 — it does for AI images what Sora is doing for video
One giant leap for AI
Stable Diffusion 3, the next generation of the popular open source AI image generation model has been unveiled by StabilityAI and it is an impressive leap forward.
Details of the new model were revealed alongside a series of image and prompts showing it is capable of following complex instructions and creating hyper realistic images.
This early preview of the model is only available to select group of testers while StabilityAI gathers feedback to improve performance and safety before a public release.
StabilityAI also used the Spawning "Do Not Train" registry to ensure that images from artists that did not want their work used to train AI was excluded. Over 1.5 billion images were filtered from the dataset before training.
What is Stable Diffusion 3?
Announcing Stable Diffusion 3, our most capable text-to-image model, utilizing a diffusion transformer architecture for greatly improved performance in multi-subject prompts, image quality, and spelling abilities.Today, we are opening the waitlist for early preview. This phase… pic.twitter.com/FRn4ofC57sFebruary 22, 2024
Unlike DALL-E, MidJourney or Google's Imagen Stable Diffusion is an open model that can be integrated into other platforms or even run locally if you have enough compute power.
SD3 will include a suite of models ranging from 800 million to eight billion paramaters allowing for different levels of quality and for operation on a wide range of hardware devices.
Like OpenAI's Sora Stable Diffusion 3 combines the diffusion model technology with the transformer architecture which could explain the improved instruction following capabilities.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
It also uses flow matching which is a mathematical technique used to train diffusion models and involves measuring the difference betwern the real world images and the generated images at different stages of the process.
What can Stable Diffusion 3 do?
Few people outside of the development team have had direct access to Stable Diffusion 3 yet and the research paper has yet to be published, so what we know of its abilities are what the team have said and the output they have shared.
From what I can see of the images so far, it is a significant step change in generative images. It, alongside OpenAI's Sora, is an indication of a major upgrade in the way generative AI works and how well it works.
It appears to create consistent, extended and legible text on images, solves the problems around human anatomy including fingers, and captures color well.
Emad Mostaque, founder of StabilityAI said StabilityAI has 100x fewer resources for training AI models than the likes of OpenAI but are still achieving impressive work. He suggested that, like Sora, SD3 will be able to accept a range of inputs including video and image.
Details of SD3 come a few days after StabilityAI also unveiled Stable Cascade, a new technique for generating images that Mostaque says will work with SD3 in future.
More from Tom's Guide
- Stable Diffusion creator adds video to its generative AI model
- I just tried Stable Diffusion's new real-time AI image generator
- ChatGPT rival from Stable Diffusion creators just launched
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?