Grok gets eyes — X-based chatbot can now analyze images

(Image credit: Shutterstock)

Elon Musk's artificial intelligence company, xAI, has unveiled a major new update to its AI assistant called Grok. The latest iteration now incorporates vision capabilities, enabling Grok to analyze and comprehend images, alongside its existing text functionalities.

Grok can already generate images using the Flux model from Black Forest Labs and it was the last of the major AI chat products not to include image analysis, also known as AI vision.

With the introduction of this vision feature, Grok can analyze images linked to posts on the X platform, interpret visual content such as documents, diagrams, and photographs and understand spatial relationships within images to help better describe the contents.

You could use this to come up with recipe ideas based on a photo of ingredients, identify the location of a landmark inside a photo shared on X or even explain the results of a graph. The last part could be particularly useful on a news-heavy platform like Grok.

How vision works in Grok

Users will soon notice a new button on posts containing images on the X platform. When clicked it sends the image to Grok, allowing users to pose questions or request analyses of the visual content. It could also be used to help with describing images for people with sight issues.

We haven’t seen official benchmarks yet but according to xAI Grok's vision capabilities hold their own against established models from OpenAI, Google and Anthropic. To this end, the company has introduced a new benchmark, RealWorldQA, designed to evaluate the model’s proficiency in understanding and reasoning about the physical world through images.

The announcement led to varied reactions from the AI community and users with some enthusiastic about how fast Grok is advancing, while others remained cautious, questioning its performance against established AI models.

What comes next for Grok

Elon Musk-owned xAI has a 200,000 GPU data center built for the sole purpose of training future versions of Grok. I think it's safe to say we’re going to see big things from the model in the future.

Specifically related to vision capabilities, these could find their way into robots. Musk owns Tesla, which also has its own robotics division. In the future, we may also see video and voice analysis from Grok as these are features already in place with Gemini and ChatGPT.

While this update marks a notable advancement for Grok, it's clear that the model is still in development compared to more mature AI models like Gemini or ChatGPT. As with all rapidly evolving AI technologies, we'll need to monitor both the upgraded capabilities and the ethical considerations of these developments in the months ahead.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

Intel Core M3

Intel Pentium

128GB

256GB

1TB

Black

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 30 deals

Filters☰

(15-inch 256GB)

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Lenovo IdeaPad Duet 3

$369.99

View

Asus ROG Zephyrus G14 2023

Apple MacBook Pro 14-inch M3 (2023)

(1TB Black)

Our Review

☆☆☆☆☆

$2,399

$1,998.98

View

Apple MacBook Pro 14-inch M3 (2023)

(Black)

Our Review

☆☆☆☆☆

$1,999

$1,699

View

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$379.99

View

Asus ROG Zephyrus G14 2023

$3,299.99

View

Ritoban Mukherjee is a freelance journalist from West Bengal, India whose work on cloud storage, web hosting, and a range of other topics has been published on Tom's Guide, TechRadar, Creative Bloq, IT Pro, Gizmodo, Medium, and Mental Floss.

With contributions from

Ryan MorrisonAI Editor

How vision works in Grok

Sign up to get the BEST of Tom's Guide direct to your inbox.

What comes next for Grok

More from Tom's Guide