Meet Groq — the chip designed to run AI models really, really fast

AI iamge of a chip
(Image credit: Adobe Firefly AI image)

Unless you’ve been living under a rock or in a simulated Mars capsule in a desert somewhere you may have noticed AI has taken over. From chatbots making pictures to catflaps refusing entry if your feline friend has a mouse in its mouth — artificial intelligence is watching.

However, we’ve barely scratched the surface of what AI can do, might do and will do for humanity over the next few years and Groq hopes to be at the centre of that revolution.

Formed by the side of a pool, Groq’s money maker is the Language Processing Unit (LPU), a new category of chip designed not for training AI models but for running them very fast.

The GroqChip is currently a 14nm processor and gains its performance benefit from scale, operating in the cloud as a cluster of well-structured units efficiently parsing data. 

Having access to very low latency AI inference is helping close some of the bottlenecks in the delivery of AI solutions. For example text-to-speech and vice-versa can happen in real time, allowing for natural conversations with an AI assistant, including allowing you to interrupt it.

Creating a chip specifically for running AI

Many of the companies trying to compete with Nvidia in the artificial intelligence space are going after the training market, but Groq took the decision to focus on running the models.

"We've been laser-focused on delivering unparalleled inference speed and low latency,” explained Mark Heap, Groq’s Chief Evangelist during a conversation with Tom’s Guide. “This is critical in a world where generative AI applications are becoming ubiquitous."

The chips, designed by Groq founder and CEO Jonathan Ross, who also led the development of Google's Tensor Processing Units (TPU) that were used to train and run Gemini, are designed for rapid scalability and for the efficient flow of data through the chip.

Heaps explained it as working more like a planned, gridded city where traffic knows where to go and can easily follow the layout, where other chips are like driving in Delhi with complex road layouts and heavy traffic. 

"Our architecture allows us to scale horizontally without sacrificing speed or efficiency... It's a game-changer for processing intensive AI tasks,” he told me.

Thrust into the limelight 

Groq

(Image credit: Groq)

The company is being built on sets of core pillars including tackling latency whilst ensuring the entire program is scalable. This is being delivered largely through its own cloud infrastructure with more global data centers coming online this year or next.

While edge devices such as driverless cars is something that could become viable when they shrink the chips down to 4nm in version two, for now the focus is purely on the cloud. 

This includes access through an API for third-party developers looking to offer high speed and reliable access to open source models from the likes of Mistral or Meta. As well as a direct consumer chatbot-type interface called GroqChat.

It is the launch of this public, and easy to access interface that seemed to propel this six year old company into the limelight. They’d been working away in the background including during the Covid pandemic providing rapid data processing for labs, but this was a pivotal moment.

Our architecture allows us to scale horizontally without sacrificing speed or efficiency... It's a game-changer for processing intensive AI tasks,

Mark Heaps

Heaps told me that the discussion with Jonathan Ross was “why don't we just put it on there and make it so that people can try it.” This was off the back of internal experiments getting open source models like Llama 2 and Mixtral running on GroqChips.

“Going back even a month and a half ago we had a completely different website and you had to click three links deep to find it. And it was just kind of nested and it was sort of an experiment,” Heaps explained. “And then a few people hit it and said, you know, this is great, but gosh, why do you make me go through all these clicks?”

Ross told the team to make it the homepage. Literally, the first thing people see when visiting the Groq website. “It was a little scary,” Heaps admitted. “His goal was: I want there to be no website in regards to marketing pages. I only want it to be the chat.” So that is what they implemented.

What you can do with low latency AI

Low latency AI allows for genuine realtime generation. For now the focus has been on large language models including code and text. We’re seeing up to 500 tokens per second which is dozens of times faster than a human can read and its happening on even complex queries.

There will be new models added soon but then they’ll work on delivering the same rapid generation of images, audio and even video. That is where you’ll see the real benefit including potentially real time image generation even at high resolutions. 

The other significant advantage is being able to find a single piece of information from within a large context window, although that is in the future versions where you could even have real-time fine-tuning of the models, learning from human interaction and adapting.

This could then allow for a true open world game, something akin to the Oasis in Ernest Cline's seminal novel Ready Player One. Live AI rendering and re-training would allow for the sort of adaptability required to reflect so much interact and change from multiple players.

The pivot to running AI models was a side project

Groq has been around since 2016 with much of the first few years spent perfecting the technology. This included working with labs and companies to speed up run-time on complex machine learning tasks such as drug discovery or flow dynamics.

The pivot to running LLMs coincided with the rise of ChatGPT and the leak of Meta’s Llama large language model. Heaps told Tom’s Guide: “We literally had one engineer who, who said, I wonder if I can compile [Llama]. He then spent 48 hours not getting it to work on GroqChip.” 

What took most of the time was actually removing much of the material put into Llama to make it run more efficiently on a GPU as that “was going to bog it down for us,” said Heaps. Adding: “Once he got all that scrubbed out, because we don't use CUDA libraries or kernels or anything, we were like, ‘oh, we can run llama’. So we've been using it internally since then.”

We literally had one engineer who, who said, I wonder if I can compile [Llama]. He then spent 48 hours not getting it to work on GroqChip.

Mark Heaps

Over the next few months they started to integrate other models and libraries and, while only Mixtral and Llama 2 are available on the public Groq interface, others, including audio AI like text-to-speech generators, are being actively tested and converted to run on GroqChips.

One thing we can expect to see is significant disruption to a tech space that is already disrupting the entire technology sector. We’re seeing a rise in AI PCs and local hardware, but with improved internet connectivity and solving the latency issue — are they still needed?

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Price
Arrow
Any Price
Showing 10 of 37 deals
Filters
Arrow
Show more
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read more
Snapdragon X Elite
Exclusive: Qualcomm exec says AI is going to 'completely transform' laptops as we know them
Microsoft Surface Laptop 7
'AI laptops' hype is not what you think — PC makers won't like me revealing this secret
nvidia presenting
Nvidia GTC 2025 — Blackwell Ultra, Groot N1, self-driving cars and more from Jensen Huang's keynote
OmniHuman screenshot of AI generated video
TikTok parent company just launched stunning AI video generator — OmniHuman-1 is taking the world by storm
DeepSeek logo on phone
It doesn't matter if DeepSeek copied OpenAI — the damage has already been done in the AI arms race
NVIDIA AI NIM microservices and Blueprints running on RTX hardware
The Future of AI is Being Built Today, Accelerated by GeForce RTX 50 Series GPUs on RTX AI PCs
Latest in AI
AI in man's hand
AI
ChatGPT on iPhone
ChatGPT was down — updates on quick outage
Claude AI on phone sitting on keyboard
Claude 3.7 Sonnet now supports real-time web searching — but there's a catch
The Dnsys X1 Exoskeleton being worn
I tested an AI exoskeleton to help treat my immune arthritis — here’s what happened
Gmail logo on iPhone
I used Google Gemini to declutter my Gmail account — here's how you can do it too
Squid Game star Lee Jung Jae appearing in an advert for Perplexity
Perplexity just brought in a 'Squid Game' star to convince you to ditch Google
Latest in Features
A hand feels the temperature regulation of the SPRINGSPIRIT Dual Layer Mattress Topper.
What is a bamboo mattress topper and should you buy one?
2025 Mini Cooper Countryman SE All4 review.
I drove the Mini Cooper Countryman EV for a week — here’s my pros and cons
Troubadour Apex 3.0 Backpack
I tested this laptop backpack for 6 months — and it’s one of the best purchases I’ve ever made
a person with muscular calves running
Physio says runners need these 3 calf strength variations in their training — here’s why I’m finally listening
Obscura VPN website landing page
Obscura VPN wants to be the "best darn VPN out there" – can it?
Galaxy S25 Ultra next to macro shot of flower
I test camera phones for a living — here's 3 tips for taking great macro shots