I put ChatGPT to the test to see how well it understands images — and you can see the results here

Woman using ChatGPT app on the beach
(Image credit: Shutterstock)

We used to say a picture was worth a thousand words, but with inflation and the rise of artificial intelligence that exchange rate is likely to have changed.

ChatGPT’s GPT-4o offers one of the best AI vision models out there. Feed it a photo you took and fire your questions away. I came up with 5 different challenges ranging from object identification and creativity to a small game of visual estimation. 

Would ChatGPT rise to the occasion or fall short under pressure? In previous tests, we've had different AI models create recipes from pictures of food, or describe photos of Tom's Guide reporters

1. How do I cook that?

ChatGPT Vision

(Image credit: Christoph Schwaiger/Future)

Have you ever found yourself at a restaurant eating a dish you were dying to recreate at home? I asked ChatGPT if it could have a look at this quick food snap I made at a restaurant. 

I’m not a vegetarian but I was tempted by this eggplant steak seasoned with miso, a fermented soybean paste. It was topped with lime mayonnaise and I ordered a side of fries. I didn’t tell the chatbot what any of the ingredients were to see how far it would get. 

As ChatGPT got to work I felt I wasn’t giving it enough information to work with, but I wasn’t allowed to go through all the stages of guilt. 

ChatGPT jumped in telling me I was looking at an eggplant topped with a creamy sauce of mayonnaise and miso with some sesame seeds on top. In five steps it also told me how to recreate it. Full marks.

2. Pimp my train

ChatGPT Vision

(Image credit: Christoph Schwaiger/Future)

ChatGPT Vision

(Image credit: Christoph Schwaiger/Future)

Perhaps ChatGPT got lucky and was secretly a cooking enthusiast. How would it fare with something more mundane like public transport?

I snapped a quick photo of a coveted single seat in a train and asked ChatGPT to redesign the space to be more suitable for luxury business travel to maximize productivity.

ChatGPT suggested replacing the fold-down seat with something more ergonomic with charging ports in the armrests. Privacy dividers could create individual work pods, each with adjustable lighting. 

ChatGPT was ambitious and suggested a control panel that allows for lighting, temperature, and media control for the immediate space. 

Lastly, it recommended some storage space for a small bag and a retractable tray for drinks and snacks. I liked what I read, and using the integrated DALL-E image generator I created a mockup of this new design.

3. Reading list

ChatGPT

(Image credit: Christoph Schwaiger/Future)

I constantly find myself short on time to curl up with a good book. Could ChatGPT have a quick look at my library’s bookshelf and give a top 5 list of books I should read? I found a random section and snapped a quick photo which I showed ChatGPT.

Here’s where things went awry as ChatGPT wasn’t able to decipher the book titles properly. What’s more, for those which it gave its best guess, it failed to look up the book and gave suggestions about what it thought the book might be about. 

ChatGPT was literally judging books by their cover. This test was practically over before it even began.

4. Is it a bird? Is it a plane?

ChatGPT Vision

(Image credit: Christoph Schwaiger/Future)

Was one misstep going to derail ChatGPT or was it going to push through?

Museums happen to be particularly good at labelling objects and providing some facts about them. Could ChatGPT identify a random object and do the same?

I dug in my archives and found a video I had taken of an aircraft engine that was used back in World War II. I cropped out any obvious labels and gave the image to ChatGPT with zero context.

“The image you provided appears to feature a radial engine, likely from an aircraft. Radial engines are a type of internal combustion engine that was commonly used in older airplanes, particularly during World War II and in some post-war designs,” ChatGPT said. Impressive!

However, ChatGPT did get ahead of itself as it confidently told me to let it know if I wanted specifics on the manufacturer. Well since you’re offering! 

Unfortunately, ChatGPT’s guesses were that this engine is either from Wright Aeronautical or Pratt & Whitney. The correct answer was BMW which produced thousands of these BMW 801 engines.

5. Size matters

ChatGPT Vision

(Image credit: Christoph Schwaiger/Future)

Would ChatGPT overcome my final challenge? I designed a little game where I wanted it to estimate the size of a random shoe. 

I placed a Google Chromecast remote and the shoe on opposite ends of the board to allow ChatGPT to calculate how many remotes would be needed to make up the length of the slider. To add an extra challenge I wanted to add a red herring. Since I didn’t have one in my refrigerator I used a red tomato instead.

ChatGPT discarded the tomato immediately and identified the remote and its length. It tried to estimate the size of the shoe by comparing the pixel lengths of the shoe and the remote and determined the shoe was a size 4-5 (EU 33-35). It was actually a size 5-6, but I guess that’s close enough.

More from Tom's Guide

Category
Arrow
Arrow
Back to Mobile Cell Phones
Storage Size
Arrow
Colour
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 227 deals
Filters
Arrow
Load more deals
TOPICS
Christoph Schwaiger

Christoph Schwaiger is a journalist who mainly covers technology, science, and current affairs. His stories have appeared in Tom's Guide, New Scientist, Live Science, and other established publications. Always up for joining a good discussion, Christoph enjoys speaking at events or to other journalists and has appeared on LBC and Times Radio among other outlets. He believes in giving back to the community and has served on different consultative councils. He was also a National President for Junior Chamber International (JCI), a global organization founded in the USA. You can follow him on Twitter @cschwaigermt.