I just tried StabilityAI’s new approach to image generation — meet Stable Cascade

(Image credit: StabilityAI/Future)

Artificial intelligence company StabilityAI has unveiled its next generation AI image model Stable Cascade, capable of generating photorealistic pictures from text or images and doing so much faster than previous generation models.

Stable Cascade is different to previous diffusion models such as Stable Diffusion. It works by building on three distinct models, creating a cascade of images, improving the output as it passes through each and creating space for easier fine-tuning.

Testing the model you can see the image form in front of you from your prompt, with a convergence of pixels and shapes until it sharpens to full resolution.

What can Stable Cascade do?

Stable Cascade generated images — (Image credit: StabilityAI)

One of the biggest selling points of this new model over previous Stable Diffusion models is the ability to create accurate and realistic text on the images. Although from my limited testing it was hit and miss, much like other AI image text tools.

This is something MidJourney achieved with version 6 earlier this year and OpenAI achieved with DALL-E 3 last year. Google can also create image text with Imagen 2 but they all have similar consistency issues.

The most important feature seems to be flexibility in training and fine-tuning, making it perfect for companies wanting to adapt the model to their own style or train it on licenced and restricted image libraries.

It is built on a new architecture called the Würstchen architecture. This considers the need to be cost-effective while also having competitive performance at scale, allowing for the cascade effect.

Train the model yourself

Stable Cascade generated AI image — (Image credit: StabilityAI/Future)

StabilityAI focuses on open source, releasing models and weights to the public under a non-commercial licence for retraining, offline use and customization.

The company, which also participated in the development of Stable Diffusion and its related models, says the new model is is “exceptionally easy to train and fine-tune on consumer hardware."

It added: “Additionally, we are releasing training and inference code that can be found on the Stability GitHub page to allow further customization of the model and its outputs. The model is available for inference in the diffusers library.”

How does Stable Cascade compare?

Stable Cascade generated image — (Image credit: StabilityAI/Future)

I haven’t had much time playing with Stable Cascade but the images I have generated using a Hugging Face space, as well as those I’ve seen shared, are of impressive quality considering the speed of generation.

While you don’t often have to wait long for access to images generated by MidJourney or DALL-E, it is noticeably longer than it took to create images with Stable Cascade. It feels closer to the real-time generation of SDXL Turbo, also from StabilityAI, but with higher resolution.

The text generation, during my limited experiments, was about as good as DALL-E or MidJourney, although Stable Cascade made more mistakes.

What is important to note is that this is a model designed for fine-tuning and further training. It's the third party-platforms — or Stable Cascade's eventual deployment to StabilityAI's Clipdrop image generation platform — where it will come into its own.

How can I try Stable Cascade today?

Stable Cascade is available to try through a Hugging Face space, although access is dependent on how busy it is at the time. I found I rarely had to wait more than a few seconds for access to a GPU to run the model.

You can also download a version of Stable Cascade for non-commercial use to install on your laptop but you'll need a hefty GPU and plenty of RAM. There is a one-click installer for Windows and Mac in the Pinokio app.

It is likely third-party sites like Leonardo or Night Cafe will introduce versions of Stable Cascade in the future.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

Intel Core i7

Intel Core M3

Intel Pentium

8GB RAM

16GB RAM

128GB

256GB

512GB

1TB

13.3-inch

13.6-inch

Black

Grey

Silver

EMMC

SSD

Showing 10 of 37 deals

Filters☰

Apple MacBook Air M2 2022

(13.6-inch 256GB)

$999

$799

View

Lenovo IdeaPad Duet 3

$369.99

View

Asus ROG Zephyrus G14 2023

$1,599.99

View

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View

Asus Zenbook S 13 OLED

(OLED)

$1,199.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Intel Core M3)

Our Review

☆☆☆☆☆

$2,399

$1,998.98

View

Apple MacBook Air M2 2022

$1,499

View

Asus Zenbook S 13 OLED

(13.3-inch 1TB)

$1,849.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB SSD)

Our Review

☆☆☆☆☆

$1,799

$1,699.95

View

Apple MacBook Air M2 2022

(Silver)

$999

$799

View

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?