I put Grok vs Gemini in a 7-round AI image generator face-off — here's the winner

(Image credit: Gemini vs Grok/Future AI image)

Creating an image using artificial intelligence is easier than ever. When you use a chatbot it's simpler still, as the language model takes all the guesswork out of prompting for your picture.

Grok is a relative newcomer to the chat platform space. Built into X, it is now freely available, and rumor suggests it will be moving out on its own at some point next year with a dedicated URL. This will put it in more direct competition with Gemini, ChatGPT, Claude, and MetaAI.

The xAI team has also given Grok its own custom AI image creation model. It was previously using Flux to create pictures but has now shifted to Aurora, although Elon Musk says we shouldn’t use that name and instead just think of Grok making its own pictures.

Gemini has also recently undergone a major overhaul with Gemini 2.0 Flash joining the models available for Gemini Advanced subscribers. However, at least for now, it still uses the underlying Imagen 3 model to create pictures. This will change as Gemini 2.0 has native image abilities.

Both Grok and Gemini are particularly good at the task of generating images, either in crafting prompts for another model or refining one you’ve already written. So I put them head to head.

Creating prompts for the test

Creating prompts to test two chatbots in their ability to generate images is slightly different to writing prompts for Midjourney or Ideogram. The focus is on keeping it simple and using top-level concepts with some description, as the AI will fill in the gaps.

You also need to use trigger words and phrases such as “imagine”, “paint” or “craft” to let the model know you want a picture, not a story or text response. I want photos rather than drawings so will use that as a keyword.

Gemini will only output images in a 1:1 resolution and so far, Grok seems to favor 4:3. Unless otherwise indicated all the images are the first response with no follow-up refinement. They were all also requested within the same session rather than creating a new chat for each prompt.

1. Modern Urban Wildlife

Prompt: “Generate a photograph-style image of a red fox navigating a rainy city crosswalk at dawn, while pedestrians with umbrellas wait at the signal.”

This first prompt is designed to test how well they depict animals as well as capture the right lighting and background elements. The ideal output would look like a stylized photograph with rain effects but also maintaining as realistic view as possible.

While the Gemini image is more striking, I think Grok gets closer to what I had in my mind. The fox is much more realistic than in the Gemini image.

Winner: Grok

2. Kitchen in Action

Prompt: “Generate a photograph-style image of a professional chef's kitchen during the dinner rush, with steam rising from pots and flames visible from the grill station.”

This is designed to show how well they can accurately display kitchen equipment, follow the prompt and handle elements like heat and moisture. It should show a commercial kitchen and behavior, also demonstrating the idea of activity.

Grok wins this one easily as Gemini failed to understand the context of the prompt, that we would expect a chef to be in the kitchen.

Winner: Grok

3. Construction Site Progress

Prompt: “Generate a picture in a documentary photography style of a mid-rise building under construction, with workers installing glass panels while cranes operate overhead on a clear afternoon.”

This prompt aims to see how well it can generate perspective, as it needs to show height and positioning. It also needs to show material properties and be as realistic as possible. I went for the documentary style as it also adds additional complexity.

Gemini's image looks so much more realistic than Grok, where it fails to include any of the workmen and only shows a broad view.

Winner: Gemini

4. Farmers Market Morning

Prompt: “Create an image in a smartphone photography style of a busy farmers market at 7am, with vendors setting up stands while early customers inspect fresh produce.”

With this comparison, the models should show the time of day (getting lighting right) as well as product freshness and human interaction. I'm looking for shadow lengths and activity levels.

This was the hardest call for me. I preferred the natural look of the Gemini image but I think Grok more accurately captured the lighting and time of day.

Winner: Grok

5. Auto Repair Diagnostic

Prompt: “Create a black and white, retro-style photograph of a mechanic using a diagnostic tool on a modern car, with the hood up and engine bay visible.”

I wanted to see how well both models handled black-and-white photography. In this they also had to show tool use, lighting and engine detail.

Again, this was a close call between the two images but I've given it to Gemini as it more accurately displayed engine details.

Winner: Gemini

6. Emergency Response

Prompt: “Make me an action photograph of paramedics treating a patient on a neighborhood street while police direct traffic around the scene.”

Action photography is a challenge. I did it for a while as a journalist earlier in my career (not very well). We need to show correct positioning, public safety measures within the image and a sense of urgency.

Gemini matched the prompt much more closely and created a more realistic-looking image. This was an easy decision.

Winner: Gemini

7. Violin Performance Practice

Prompt: “Create a photo-style image of a violinist practicing alone in a room at sunset, sheet music visible on the stand.”

Finally something more artistic. Here we want to see hand positioning for the violin, natural lighting effects and the quality of the sheet music.

One of these looks like the cover of a classical album, the other like a photograph of someone practicing violin. As the prompt asks for someone practicing I've given the win to Grok.

Winner: Grok

Winner: Gemini vs Grok

Swipe to scroll horizontally

Header Cell - Column 0	Grok	Gemini
Fox in the city	⭐️	Row 0 - Cell 2
Chef in the kitchen	⭐️	Row 1 - Cell 2
Construction	Row 2 - Cell 1	⭐️
Farmers market	⭐️	Row 3 - Cell 2
Auto repair	Row 4 - Cell 1	⭐️
Emergency response	Row 5 - Cell 1	⭐️
Violin practice	⭐️	Row 6 - Cell 2
Total	4	3

Grok is very impressive. Not only as a chatbot but also in its ability to generate realistic images. That doesn't take away from Imagen 3 which is in itself very impressive, but it has a habit of being too stylized.

It was a close match-up. Both models are fairly evenly matched but Grok is better at interpreting a prompt and creates more natural-looking images.

What is worth noting is that soon Google will be launching a new version of Gemini that can create images natively. That means it won't have to use Imagen 3 to create the pictures, it can do it on its own.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

8GB RAM

16GB RAM

128GB

512GB

1TB

Black

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 49 deals

Filters☰

Apple MacBook Air M3

$849

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$379.99

View

Asus Zenbook S 13 OLED

(13.3-inch 512GB)

$1,524.99

$1,189.99

View

Asus ROG Zephyrus G14 2023

$1,599.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Apple MacBook Pro 14-inch M4 (2024)

$1,599

View

Apple MacBook Pro 14-inch M4 (2024)

(Black)

Asus ROG Zephyrus G14 2023

$3,299.99

View

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

6 Comments Comment from the forums

sycoreaper

A more interesting faceoff would have been OpenAI (ChatGPT) vs Grok.
I am surprised that Gemini did as well as it did, my experience with it overall has been less than favorable.
Reply
RyanMorrison

sycoreaper said:
A more interesting faceoff would have been OpenAI (ChatGPT) vs Grok.
I am surprised that Gemini did as well as it did, my experience with it overall has been less than favorable.
DALL-E, the image generator used by ChatGPT is ancient in AI terms and is one of the worst of the major models. I rarely use it in tests anymore because of how bad it is in comparison.

Google Gemini uses Imagen 3 which is only a couple of months old and surprisingly good.
Reply
rvt1234

Why did grok win inn 7) ? The hands are all wrong. At least gemini got that. Lot better
Reply
RyanMorrison

rvt1234 said:
Why did grok win inn 7) ? The hands are all wrong. At least gemini got that. Lot better
I decided this one purely on aesthetic. Grok more closely matched the more casual - practice - concept.
Reply
cenzi

You picked the wrong one in the fox image. The people are NOT at a crosswalk waiting to cross.
Reply
sycoreaper

RyanMorrison said:
DALL-E, the image generator used by ChatGPT is ancient in AI terms and is one of the worst of the major models. I rarely use it in tests anymore because of how bad it is in comparison.

Google Gemini uses Imagen 3 which is only a couple of months old and surprisingly good.
Interesting, wasn't aware of that. It seems then they are more focused on the information side of AI which is fine by me.
Reply

Show more comments

Creating prompts for the test

Sign up to get the BEST of Tom's Guide direct to your inbox.

1. Modern Urban Wildlife

2. Kitchen in Action

3. Construction Site Progress

4. Farmers Market Morning

5. Auto Repair Diagnostic

6. Emergency Response

7. Violin Performance Practice

Winner: Gemini vs Grok

More from Tom's Guide