I just tested ChatGPT image generation — and it looks like DALL-E has been given a secret upgrade
Noticeable improvements
I use ChatGPT daily for various tasks including brainstorming ideas, adjusting recipes and even creating images for Father’s Day. Recently DALL-E 3, the AI image generator ChatGPT uses to make its images, seems to have been upgraded.
OpenAI hasn’t made any announcements about an upgrade. I can’t find any release notes suggesting any changes but I’m not the only one to notice it's ability to render text has increased significantly — and with much longer blocks of text than before. In other words, the battle for best AI image generator just got more interesting.
My suspicion is that this is GPT-4o at work. Not in making the images itself — although OpenAI says it will be able to in future — but in refining user prompts before sending them off to ChatGPT, to create more accurate output.
The full set of GPT-4o capabilities has yet to be enabled. Currently, it is only used for text analysis, image analysis and text generation. Although you do occasionally see moments where the full multimodal capabilities creep in — that is an exception and it primarily uses DALL-E 3 to make its images.
Putting DALL-E 3 to the test
Open AI DALL-E 3 has received massive upgrades:It can now generate images with longer text, with a success rate of over 95%.However, its capability to produce photo-realistic images is quite poor.Share your results in the comments.5 examples in this 🧵(Prompt in ALT)… pic.twitter.com/sw7v8BZfBqJune 17, 2024
To find out just how good DALL-E 3 has become I gave it 7 challenging prompts. We cover ancient scrolls, comic books and steampunk scenes.
In each of the prompts I had them create a widescreen image and included text. If you want it to more accurately render text put it inside quotation marks. This is true for any AI image generator including Midjourney, Ideogram and Leonardo.
We have several guides on both creating images and using ChatGPT that are worth checking out if you want to get the most out of using AI tools.
1. Magical Potion Recipe Scroll
I'm pushing its text rendering capabilities to the limit with this prompt. I'm essentially saying create a scroll with instructions and a headline. In the past I would expect a headline at the top then nonsensical text everywhere else.
The prompt: “An ancient scroll unfurled on a wooden table, surrounded by mystical ingredients like dragon scales and phoenix feathers. The text on the scroll reads "Potion for Eternal Wisdom" with detailed, whimsical instructions and illustrations.“
While it wasn't perfect and had some double letter issues and repetition — ChatGPT provided both the top headline and multiple smaller headlines. This was a big step up and similar in rendering capability to Ideogram — the leader in AI image text.
2. Time-Travel Café Menu
Next up is something that I've only had Ideogram do perfectly, although Midjourney gets close. Creating a menu board in a cafe. DALL-E 3 always struggles here.
The prompt: “A cosy café where the menu board displays "Today's Specials" from different eras, such as "Medieval Mead," "Victorian Scones," and "Future Fusion Smoothie." Customers in period attire enjoy their unique treats.“
As you can see from the image it did a good job. We got a double Victorian Scones on the top board and random phrasing around the room but overall it was well rendered.
This is looking good. We've now had two images where it can render separate blocks of text accurately.
3. Alien Comic Book Cover
A lot of AI image generators can create a good looking comic book cover. DALL-E 3 is no exception but again, it struggles with text and regularly adds double letters.
The prompt: “A dynamic comic book cover titled "Galactic Guardians" featuring diverse alien superheroes in action poses. The title is bold and flashy, with additional text like "Issue #1 - The Invasion Begins!" and "Special Edition".“
Here it effortlessly rendered the title, sub-title and even the ISSUE number. It even captured the plot concept. It is rendered in 16:9 so looks more poster than comic but I tried the prompt again with a 9:16 aspect and it worked.
4. Robot Blueprint with Annotations
The more text you ask an AI image generator to produce the more opportunities it has to make mistakes. I've found sometimes if you request a lot of text it won't even get the first piece of text correct on the image — it gets overall worse.
The prompt: “A detailed blueprint of a quirky robot with hand-drawn annotations. Labels point out features like "Anti-Gravity Boots", "Laser Vision", and "Humour Chip", with humorous side notes and sketches around the edges.“
Because of the annotation request I expected a total failure and was brilliantly surprised. Yes its not perfect and seems to just repeat the specific words I highlighted over and over — but its legible and looks cool.
5. Steampunk Time Traveller's Journal
When you ask an AI to show you a book it has a tendency to put it on a table first, often making the table look like the book. It also might get one word correct. Here I'm asking for specific phrasing in two lines plus sketches and style
The ChatGPT prompt: “An open journal filled with intricate sketches of steampunk inventions, maps, and notes. The text on the pages includes "Journey to the Future - 3024 AD" and "Invention Idea: Steam-Powered Time Machine".“
It still put the book on a table that looks like the book. Stylistically impressive but not really what I hoped for. It did get the text correct and captured the style idea.
6. Whimsical Recipe Book Page
This prompt was going to be an uphill battle for the AI. Not only did it have to get the title correct but also specific ingredients. Earlier versions of DALL-E 3 wouldn't have even managed the title and recipe title. It would have been either or.
The prompt: “A page from a fantastical recipe book titled "Cooking with Magic". The recipe is for "Fairy Dust Cupcakes", with ingredients like "1 cup of stardust" and "2 teaspoons of moonlight". Illustrations of the cupcakes and magical kitchen tools adorn the page.“
It wasn't perfect but Cooking with Magic looks good, it got the title of the recipe and the first ingredient but then things started to go downhill. But it was better than I expected. I tried the same prompt on Ideogram and the style was better but the text rendering had similar degrading issues as it went down the page.
7. Vintage Travel Poster for a Fictional Destination
Finally a poster. This was one of the first things AI companies cracked in terms of legible text so shouldn't be too difficult — but I'm asking it for multiple blocks.
The prompt: “A retro-style travel poster advertising "Visit the Floating Islands of Aerion". The poster features breathtaking views of floating islands with waterfalls, and the text includes travel details and a catchy slogan like "A Sky-High Adventure Awaits!".“
To make it work it had to generate a title and sub head, plus a second heading and I think it created a perfect poster. Yes other items have some weird quirks on the page but I didn't tell it how to render those, I left it to handle that itself.
Final thoughts
Overall I think there is a clear improvement in the accuracy of DALL-E 3's rendered text. But at the same time it has also gone a bit backwards in how it actually renders it, adding more artefacts and blurring around the words.
When you use Ideogram or Midjourney the text tends to be crisper but images with text from DALL-E have a degree of distortion to them.
I don't think this matters as much for a quick father's day card or a fun greeting, but if you want to use it to make a t-shirt or even in a public-facing project then it becomes more of an issue.
This could be solved with a new version of DALL-E. GPT-4o seems to be putting most of the heavily lifting in here, refining the prompts sent to the image generator, so a better image model would logically suggest better images.
More from Tom's Guide
- Apple is bringing iPhone Mirroring to macOS Sequoia — here’s what we know
- iOS 18 supported devices: Here are all the compatible iPhones
- Apple Intelligence unveiled — all the new AI features coming to iOS 18, iPadOS 18 and macOS Sequoia
Sign up to get the BEST of Tom's Guide direct to your inbox.
Here at Tom’s Guide our expert editors are committed to bringing you the best news, reviews and guides to help you stay informed and ahead of the curve!
Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?