I just put Stable Diffusion 3 AI to the test — and it generates some pretty staggering images
A new level of photorealism
StabilityAI recently announed its Stable Diffusion 3 next-generation text-to-image AI generator, with the first developers getting access to integrate it into their platforms last week.
With Stable Diffusion 3's arrival comes the promise of better prompt adherence, accurate text rendering, improved photorealism and more accurate hands and faces.
To put this to the test I’ve come up with seven prompts that ask SD3 to generate images related to each of these concepts. One example is hands holding a cup of coffee with words on it — it handled this perfectly and created the correct number of both fingers and joints.
One of the photorealistic images I created was of a remarkable close-up of an eye that was so realistic it ventured into uncanny valley disturbing territory.
Creating prompts to test SD3
Today, we are pleased to announce the availability of Stable Diffusion 3 and Stable Diffusion 3 Turbo on the Stability AI Developer Platform API.We have partnered with @FireworksAI_HQ , the fastest and most reliable API platform in the market, to deliver these models.In… pic.twitter.com/4q4wUf870QApril 17, 2024
There are only a handful of places with the latest model at the moment but that is likely to expand in the coming months. The easiest way is to use ClipDrop but it only gives a handful of images per day with SD3 and no option to increase the limit.
I decided to use Fireworks.ai, an inference platform that lets you try out a range of different AI models including Stable Diffusion 3. To use SD3, you do need a StabilityAI API key and tokens attached to your StabilityAI account. It is 6.5 tokens per image and 1,000 tokens cost $10.
1. Betty’s Diner
Firt prompt was designed to see how well it could handle complex and stylzed text on a photorealistic image. This required two lines of text in different parts of the image. SD3 did this perfectly.
Sign up now to get the best Black Friday deals!
Discover the hottest deals, best product picks and the latest tech news from our experts at Tom’s Guide.
The prompt: “A vintage 1950s-style diner with a neon sign in the window that reads "Betty's Burgers & Shakes - Est. 1952".”
2. Morning coffee
Two areas AI image generators have struggled with since the start are text and fingers. This prompt tests StabilityAI's claim it can do both — and it proved StabilityAI correct. SD3 created an interesting mug with realistic fingers — mostly.
The prompt: “A close-up of a person's hands cupping a mug of hot coffee, with steam rising from the surface. The mug has a logo that reads "Morning Brew Coffee Co."”
3. Vintage poster
This test really pushed the limits of text generation. It required multiple lines and stylized to meet the requirements of the prompt. I think it exceeded what I expected and designed a compelling poster with an attempt at a logo under the title.
The prompt: “A classic 1960s-style poster featuring a stylized illustration of a woman in a polka-dot dress, with the text "Vintage Fashion Expo" in a bold, serif font.”
4. Mustang coffee
Next up is a prompt with a need to get both the vehicle, text and overall aesthetic correct. I'd have this image as a poster on my wall. I don't know enough about cars to tell how accurate SD3's depiction of a 1969 Ford Mustang is but it does seem to have struggled with the logo and the hood. However, the end result is visually striking.
The prompt: “A close-up of a red 1969 Ford Mustang's grille and headlights, with the car parked in front of a classic American diner. The diner's neon sign reads "Joe's Diner - Established 1954".”
5. An eye on the world
This is one of the most beautifully disturbing AI images I've ever generated, and I've created thousands of images using AI. It is captivating and incredibly detailed.
The prompt: “A close-up of a human eye with intricate details and reflections, capturing the complexity of the iris.”
6. Times Square
One thing I've found with SD3 is a need for longer and more detailed prompting, especially if you want text to appear properly. Here we were able to generate a modern city view with digital billboards and accurate text.
The prompt: “A Times Square streetscape at night, with bright, illuminated billboards and bustling crowds. One of the billboards displays an advertisement for a new Broadway musical, with the text "Introducing 'Starlight Dreams' - A Dazzling New Musical Extravaganza!".”
7. A cute robot
I played with the seed for this one to try and get the right image. Several of them were very odd. This looks like a robot a company might actually sell one day.
The prompt: “A charming, 3D-rendered image of a quirky robot character, with expressive eyes and a friendly smile, reminiscent of Pixar's animation style.”
More from Tom's Guide
- Stable Diffusion creator adds video to its generative AI model
- I just tried Stable Diffusion's new real-time AI image generator
- ChatGPT rival from Stable Diffusion creators just launched
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?