What is Visual ChatGPT?

Bing with ChatGPT on Edge browser MacBook Pro
(Image credit: Future)

ChatGPT has the potential to redefine the way we search the internet, but currently, it's limited to text. This ignores one of the most used search engine features: images.

To that end, Microsoft has now unveiled Visual ChatGPT, an upgrade to the chatbot that enables it to both produce images from text and process image prompts uploaded by users. 

While OpenAI itself has already dabbled in AI image generation with the DALL-E-2 system, Microsoft has set its sights higher. Visual ChatGPT is a step toward the multimodal AI that Microsoft revealed it was aiming for with the GPT- 4 upgrade coming to Bing with ChatGPT soon. 

This means that image processing could soon be joined by AI-powered video and sound tools. 

The science bit — How does Visual ChatGPT work? 

Bing with ChatGPT runs on Open AI’s GPT Large Language Model (LLM) and Microsoft’s own Prometheus model. Most AI art generators utilize a Visual Foundation Model (VFM) like Stable Diffusion to produce images. They are normally effective but rather limited in their scope. Microsoft revealed that to create Visual ChatGPT they managed to bolt a plethora of different VFMs onto the flexible GPT model. 

This was achieved via the creation of a “Prompt Manager” which Microsoft describes as helping “To bridge the gap between ChatGPT and these VFMs” that enables ChatGPT to  “leverage these VFMs and receives their feedback in an iterative manner until it meets the requirements of users or reaches the ending condition.”

How does it differ from AI image generators? 

Graphical representation of a cybernetic brain

(Image credit: Shutterstock)

This has created an AI tool that can generate images from text and image prompts, deal with complicated requests that span multiple processes, and even offer input and feedback on images uploaded or generated. 

Microsoft included an example on its Github page of a user asking the AI what color a motorbike was or getting it to identify the contents of a picture, asking “What is in this image?” to which the AI responded, “The image contains a yard.” It is interactions like this, and the ability to tweak and edit an image multiple times within the same session that separates it from standard AI image generators.  

What could Visual ChatGPT be used for? 

If a Google Image search has ever left you wanting, then Visual ChatGPT could be a great way to create and refine an image that may not exist online already. 

Photo editing software like Photoshop can be expensive and complex to use, asking Bing to remove an object from an image or change a background’s color is a much quicker and simpler method. 

The specific uses of such a tool are countless. Professionals could find a lot of use for Visual ChatGPT. Architects and interior designers could show clients what painting that wall blue or removing it completely would look like. While visually impaired users could receive accurate AI descriptions of uploaded images.

Reservations and concerns 

Image of smartphone with OpenAI ChatGPT loaded ready to use

(Image credit: Getty images)

Of course, AI tools are still in their relative infancy and with the likes of Bing and Google Bard making high-profile errors and battling quirks —we miss you Sydney — there will likely be similar issues with Visual ChatGPT. 

Similarly, when it comes to the internet, there will always be safety concerns. Inappropriate content is bound to make its way to Visual ChatGPT and it will be interesting to see how Microsoft handles explicit content with its image and video AI tools. Even with content filters, they may be ways to bypass these similar to the jailbroken ChatGPT "alter-ego" DAN

The rise of edits and tweaks to photos may also bring into question the authenticity of any image and video we see online. Social media already often features heavily idealized snapshots of life and it’s easier to see some being deceptive with these tools. Video and audio deep fakes are already a problem when it comes to spreading disinformation and this will need to be monitored carefully. 

More From Tom's Guide

Andy is a freelance writer with a passion for streaming and VPNs. Based in the U.K., he originally cut his teeth at Tom's Guide as a Trainee Writer before moving to cover all things tech and streaming at T3. Outside of work, his passions are movies, football (soccer) and Formula 1. He is also something of an amateur screenwriter having studied creative writing at university.

Read more
OpenAI logo
OpenAI ChatGPT-4.5 is here and it's the most human-like chatbot yet — here's how to try it
ChatGPT logo on a smartphone screen being held outside
ChatGPT just got OpenAI's most powerful upgrade yet — meet 'Deep Research'
ChatGPT search interface
ChatGPT Search is now open to everyone — no account required
OpenAI logo
5 tips to get the most out of ChatGPT from someone who uses it every day
Perplexity logo on a smartphone display
What is Perplexity AI? — everything there is to know about the search engine and chatbot
ChatGPT app icon on mobile device
ChatGPT 4.5 — 5 big upgrades you need to know
Latest in ChatGPT
ChatGPT on iPhone
ChatGPT was down — updates on quick outage
ChatGPT app on iPhone
I just tested ChatGPT-4.5 with 5 prompts — the good, the bad and the weird
ChatGPT app icon on mobile device
ChatGPT 4.5 — 5 big upgrades you need to know
OpenAI logo
OpenAI ChatGPT-4.5 is here and it's the most human-like chatbot yet — here's how to try it
ChatGPT app icon on mobile device
ChatGPT Plus just got a huge deep research upgrade — here's how to try it now
A person logging into LinkedIn on their phone and laptop
Looking for a job? — 7 prompts to use ChatGPT o3-mini as a job search assistant
Latest in News
Tom Hiddleston as Robert Laing in "High Rise" now streaming on Netflix
5 best Netflix movies in March you haven't watched yet
iPhone 16 with Apple Intelligence logo for iOS 18.1
iOS 18.4: All the newest Apple Intelligence features coming to your iPhone
Maria Debska in "Just One Look" now streaming on Netflix
3 best Netflix shows in March you haven't watched yet
Split image featuring the Galaxy S25 Edge (left) and Galaxy S25 Ultra (right)
Samsung Galaxy S25 Edge just tipped for two Galaxy S25 Ultra-level features
Wolfenstein: The Old Blood
Amazon is giving away a ton of free games for its Big Spring Sale — here’s how to claim yours
A TV with the Netflix logo sits behind a hand holding a remote
Netflix is rolling out a big video quality upgrade — what you need to know