I just tried StabilityAI’s new approach to image generation — meet Stable Cascade
Generates images faster
Artificial intelligence company StabilityAI has unveiled its next generation AI image model Stable Cascade, capable of generating photorealistic pictures from text or images and doing so much faster than previous generation models.
Stable Cascade is different to previous diffusion models such as Stable Diffusion. It works by building on three distinct models, creating a cascade of images, improving the output as it passes through each and creating space for easier fine-tuning.
Testing the model you can see the image form in front of you from your prompt, with a convergence of pixels and shapes until it sharpens to full resolution.
What can Stable Cascade do?
One of the biggest selling points of this new model over previous Stable Diffusion models is the ability to create accurate and realistic text on the images. Although from my limited testing it was hit and miss, much like other AI image text tools.
This is something MidJourney achieved with version 6 earlier this year and OpenAI achieved with DALL-E 3 last year. Google can also create image text with Imagen 2 but they all have similar consistency issues.
The most important feature seems to be flexibility in training and fine-tuning, making it perfect for companies wanting to adapt the model to their own style or train it on licenced and restricted image libraries.
It is built on a new architecture called the Würstchen architecture. This considers the need to be cost-effective while also having competitive performance at scale, allowing for the cascade effect.
Sign up to get the BEST of Tom's Guide direct to your inbox.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
Train the model yourself
StabilityAI focuses on open source, releasing models and weights to the public under a non-commercial licence for retraining, offline use and customization.
The company, which also participated in the development of Stable Diffusion and its related models, says the new model is is “exceptionally easy to train and fine-tune on consumer hardware."
It added: “Additionally, we are releasing training and inference code that can be found on the Stability GitHub page to allow further customization of the model and its outputs. The model is available for inference in the diffusers library.”
How does Stable Cascade compare?
I haven’t had much time playing with Stable Cascade but the images I have generated using a Hugging Face space, as well as those I’ve seen shared, are of impressive quality considering the speed of generation.
While you don’t often have to wait long for access to images generated by MidJourney or DALL-E, it is noticeably longer than it took to create images with Stable Cascade. It feels closer to the real-time generation of SDXL Turbo, also from StabilityAI, but with higher resolution.
The text generation, during my limited experiments, was about as good as DALL-E or MidJourney, although Stable Cascade made more mistakes.
What is important to note is that this is a model designed for fine-tuning and further training. It's the third party-platforms — or Stable Cascade's eventual deployment to StabilityAI's Clipdrop image generation platform — where it will come into its own.
How can I try Stable Cascade today?
Stable Cascade is available to try through a Hugging Face space, although access is dependent on how busy it is at the time. I found I rarely had to wait more than a few seconds for access to a GPU to run the model.
You can also download a version of Stable Cascade for non-commercial use to install on your laptop but you'll need a hefty GPU and plenty of RAM. There is a one-click installer for Windows and Mac in the Pinokio app.
It is likely third-party sites like Leonardo or Night Cafe will introduce versions of Stable Cascade in the future.
More from Tom's Guide
- Stable Diffusion creator adds video to its generative AI model
- I just tried Stable Diffusion's new real-time AI image generator
- ChatGPT rival from Stable Diffusion creators just launched
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?