I put Grok against Gemini in a 7-round image face-off — here's the winner

Gemini vs Grok
(Image credit: Gemini vs Grok/Future AI image)

Creating an image using artificial intelligence is easier than ever. When you use a chatbot it's simpler still, as the language model takes all the guesswork out of prompting for your picture.

Grok is a relative newcomer to the chat platform space. Built into X, it is now freely available, and rumor suggests it will be moving out on its own at some point next year with a dedicated URL. This will put it in more direct competition with Gemini, ChatGPT, Claude, and MetaAI.

The xAI team has also given Grok its own custom AI image creation model. It was previously using Flux to create pictures but has now shifted to Aurora, although Elon Musk says we shouldn’t use that name and instead just think of Grok making its own pictures.

Gemini has also recently undergone a major overhaul with Gemini 2.0 Flash joining the models available for Gemini Advanced subscribers. However, at least for now, it still uses the underlying Imagen 3 model to create pictures. This will change as Gemini 2.0 has native image abilities.

Both Grok and Gemini are particularly good at the task of generating images, either in crafting prompts for another model or refining one you’ve already written. So I put them head to head.

Creating prompts for the test

Creating prompts to test two chatbots in their ability to generate images is slightly different to writing prompts for Midjourney or Ideogram. The focus is on keeping it simple and using top-level concepts with some description, as the AI will fill in the gaps.

You also need to use trigger words and phrases such as “imagine”, “paint” or “craft” to let the model know you want a picture, not a story or text response. I want photos rather than drawings so will use that as a keyword.

Gemini will only output images in a 1:1 resolution and so far, Grok seems to favor 4:3. Unless otherwise indicated all the images are the first response with no follow-up refinement. They were all also requested within the same session rather than creating a new chat for each prompt.

1. Modern Urban Wildlife

Gemini vs Grok

(Image credit: Gemini vs Grok/Future AI)

Prompt: “Generate a photograph-style image of a red fox navigating a rainy city crosswalk at dawn, while pedestrians with umbrellas wait at the signal.”

This first prompt is designed to test how well they depict animals as well as capture the right lighting and background elements. The ideal output would look like a stylized photograph with rain effects but also maintaining as realistic view as possible.

While the Gemini image is more striking, I think Grok gets closer to what I had in my mind. The fox is much more realistic than in the Gemini image.

  • Winner: Grok

2. Kitchen in Action

Gemini vs Grok

(Image credit: Gemini vs Grok/Future AI)

Prompt: “Generate a photograph-style image of a professional chef's kitchen during the dinner rush, with steam rising from pots and flames visible from the grill station.”

This is designed to show how well they can accurately display kitchen equipment, follow the prompt and handle elements like heat and moisture. It should show a commercial kitchen and behavior, also demonstrating the idea of activity.

Grok wins this one easily as Gemini failed to understand the context of the prompt, that we would expect a chef to be in the kitchen.

  • Winner: Grok

3. Construction Site Progress

Gemini vs Grok

(Image credit: Gemini vs Grok/Future AI)

Prompt: “Generate a picture in a documentary photography style of a mid-rise building under construction, with workers installing glass panels while cranes operate overhead on a clear afternoon.”

This prompt aims to see how well it can generate perspective, as it needs to show height and positioning. It also needs to show material properties and be as realistic as possible. I went for the documentary style as it also adds additional complexity.

Gemini's image looks so much more realistic than Grok, where it fails to include any of the workmen and only shows a broad view.

  • Winner: Gemini

4. Farmers Market Morning

Gemini vs Grok

(Image credit: Gemini vs Grok/Future AI)

Prompt: “Create an image in a smartphone photography style of a busy farmers market at 7am, with vendors setting up stands while early customers inspect fresh produce.”

With this comparison, the models should show the time of day (getting lighting right) as well as product freshness and human interaction. I'm looking for shadow lengths and activity levels.

This was the hardest call for me. I preferred the natural look of the Gemini image but I think Grok more accurately captured the lighting and time of day.

  • Winner: Grok

5. Auto Repair Diagnostic

Gemini vs Grok

(Image credit: Gemini vs Grok/Future AI)

Prompt: “Create a black and white, retro-style photograph of a mechanic using a diagnostic tool on a modern car, with the hood up and engine bay visible.”

I wanted to see how well both models handled black-and-white photography. In this they also had to show tool use, lighting and engine detail.

Again, this was a close call between the two images but I've given it to Gemini as it more accurately displayed engine details.

  • Winner: Gemini

6. Emergency Response

Gemini vs Grok

(Image credit: Gemini vs Grok/Future AI)

Prompt: “Make me an action photograph of paramedics treating a patient on a neighborhood street while police direct traffic around the scene.”

Action photography is a challenge. I did it for a while as a journalist earlier in my career (not very well). We need to show correct positioning, public safety measures within the image and a sense of urgency.

Gemini matched the prompt much more closely and created a more realistic-looking image. This was an easy decision.

  • Winner: Gemini

7. Violin Performance Practice

Gemini vs Grok

(Image credit: Gemini vs Grok/Future AI)

Prompt: “Create a photo-style image of a violinist practicing alone in a room at sunset, sheet music visible on the stand.”

Finally something more artistic. Here we want to see hand positioning for the violin, natural lighting effects and the quality of the sheet music.

One of these looks like the cover of a classical album, the other like a photograph of someone practicing violin. As the prompt asks for someone practicing I've given the win to Grok.

  • Winner: Grok

Winner: Gemini vs Grok

Swipe to scroll horizontally
Header Cell - Column 0 GrokGemini
Fox in the city⭐️Row 0 - Cell 2
Chef in the kitchen⭐️Row 1 - Cell 2
ConstructionRow 2 - Cell 1 ⭐️
Farmers market⭐️Row 3 - Cell 2
Auto repairRow 4 - Cell 1 ⭐️
Emergency responseRow 5 - Cell 1 ⭐️
Violin practice⭐️Row 6 - Cell 2
Total43

Grok is very impressive. Not only as a chatbot but also in its ability to generate realistic images. That doesn't take away from Imagen 3 which is in itself very impressive, but it has a habit of being too stylized.

It was a close match-up. Both models are fairly evenly matched but Grok is better at interpreting a prompt and creates more natural-looking images.

What is worth noting is that soon Google will be launching a new version of Gemini that can create images natively. That means it won't have to use Imagen 3 to create the pictures, it can do it on its own.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Storage Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

  • sycoreaper
    A more interesting faceoff would have been OpenAI (ChatGPT) vs Grok.
    I am surprised that Gemini did as well as it did, my experience with it overall has been less than favorable.
    Reply
  • RyanMorrison
    sycoreaper said:
    A more interesting faceoff would have been OpenAI (ChatGPT) vs Grok.
    I am surprised that Gemini did as well as it did, my experience with it overall has been less than favorable.
    DALL-E, the image generator used by ChatGPT is ancient in AI terms and is one of the worst of the major models. I rarely use it in tests anymore because of how bad it is in comparison.

    Google Gemini uses Imagen 3 which is only a couple of months old and surprisingly good.
    Reply