Meta AI drops multimodal Llama 3.2 — here's why it's such a big deal

AI generated image of 3 Llamas on a chip
(Image credit: Adobe Firefly - AI generated for Future)

Meta has just dropped a new version of its Llama family of large language models. The updated Llama 3.2 introduces multimodality, enabling it to understand images in addition to text. It also brings two new ‘tiny’ models into the family.

Llama is significant—not necessarily because it's more powerful than models from OpenAI or Google, although it does give them a run for their money—but because it's open source and accessible to nearly anyone with relative ease.

The update introduces four different model sizes. The 1 billion parameter model runs comfortably on an M3 MacBook Air with 8GB of RAM, while the 3 billion model also works but just barely. These are both text only but can be run on a wider range of devices and offline.

The real breakthrough, though, is with the 11b and 90b parameter versions of Llama 3.2. These are the first true multimodal Llama models, optimized for hardware and privacy and far more efficient than their 3.1 predecessors. The 11b model could even run on a good gaming laptop.

What makes Llama such a big deal?

Meta Llama 3.2

(Image credit: Meta)

Llama's wide availability, state-of-the-art capability, and adaptability set it apart. It powers Meta’s AI chatbot across Instagram, WhatsApp, Facebook, Ray-Ban smart glasses, and Quest headsets, but it is also accessible on public cloud services, so users can download and run it locally or even integrate it into third-party products.

Groq, the ultra-fast cloud inference service, is one example of why having an open-source model is a powerful choice. I built a simple tool to summarize an AI research paper using Llama 3.1 70b running on Groq - it completed the summary faster than I could read the title.

Some open-source libraries let you create a ChatGPT-like interface on your Mac powered by Llama 3.2 or other models, including the image analysis capabilities if you have enough RAM. However, I took it a step further and built my own Python chatbot that queries the Ollama API, enabling me to run these models directly in the terminal.

Use cases for Llama 3.2

One of the significant reasons Llama 3.2 is such a big deal is its potential to transform how AI interacts with its environment, especially in areas like gaming and augmented reality. The multimodal capabilities mean Llama 3.2 can both "see" and "understand" visual inputs alongside text, opening up possibilities like dynamic, AI-powered NPCs in video games.

Outside of using the models as built by Meta, being open-source means companies, organizations and even governments can create their own customized and fine-tuned versions of the models. This is already happening in India to save near-extinct languages.

Imagine a game where NPCs aren't just following pre-scripted dialogue but can perceive the game world in real-time, responding intelligently to player actions and the environment. For example, a guard NPC could "see" the player holding a specific weapon and comment on it, or an AI companion might react to a change in the game's surroundings, such as the sudden appearance of a threat, in a nuanced and conversational way.

Beyond gaming, this technology can be used in smart devices like the Ray-Ban smart glasses and Quest headsets. Imagine pointing your glasses at a building and asking the AI for architectural history or details about a restaurant’s menu just by looking at it.

These use cases are exciting because Llama’s open-source nature means developers can customize and scale these models for countless innovative applications, from education to healthcare, where AI could assist visually impaired users by describing their environment.

Outside of using the models as built by Meta, being open-source means companies, organizations, and even governments can create their own customized and fine-tuned versions of the models. This is already happening in India to save near-extinct languages.

How Meta Llama 3.2 compares

Swipe to scroll horizontally
ModalityBenchmarkLlama 3.2 11BLlama 3.2 90BClaude 3 - HaikuGPT-4o-mini
ImageMMMU50.760.350.259.4
ImageMMMU-Pro, Standard33.045.227.342.3
ImageMMMU-Pro, Vision23.733.820.136.5
ImageMathVista51.557.346.456.7
ImageChartQA83.485.581.7-
ImageAI2 Diagram91.192.386.7-
ImageDocVQA88.490.188.8-
ImageVQAv275.278.1--
TextMMLU73.086.075.282.0
TextMATH51.968.038.970.2
TextGPQA32.846.733.340.2
TextMGSM68.986.975.187.0

Llama 3.2 11b and 90b are competitive with smaller models from Anthropic, such as Claude 3 Haiku and OpenAI, including GPT-4o-mini, when recognizing an image and similar visual tasks. The 3B version is competitive with similar-sized models from Microsoft and Google, including Gemini and Phi 3.5-mini across 150 benchmarks.

While not a direct benchmark, my own tests of having the 1b model analyze my writing and offer suggested improvements are roughly equal to the performance of Apple Intelligence writing tools, just without the handy context menu access.

The two vision models, 11b and 90b, can perform many of the same functions I've seen from ChatGPT and Gemini. For example, you can give it a photograph of your garden, and it can offer suggested improvements or even a planting schedule.

As I've said before, the performance, while good, isn't the most significant selling point for Llama 3.2; it is in how accessible it is and customizable for a range of use cases.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 43 deals
Filters
Arrow
Show more
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read more
Meta Llama 3.1
Llama 4 will be Meta's next-generation AI model — here's what to expect
Copilot, Gemini, Claude
I test AI chatbots for a living and these are the best ChatGPT alternatives
Logos of DeepSeek Qwen 2.5 MetaAI
I put DeepSeek vs Meta AI Llama vs Qwen to the test locally on my PC — here's what I recommend
AI on a laptop
You can run your own AI chatbot locally on Windows and Mac — here's how
ChatGPT
I just tested ChatGPT's new o3-mini model with 7 prompts to rate its problem-solving and reasoning capabilities — and it blew me away
Meta AI logo on a phone
Meta set to release a direct competitor to ChatGPT — here's what you need to know
Latest in AI
Microsoft Copilot app running on a phone with Microsoft logo in background
Microsoft 365 Copilot debuts new research tools for work: here's what that means
AI Mode of google search
Google’s making it easier to start new AI Mode searches — here’s how
Gemini logo on smartphone
Google Gemini Gems now available to all users without a subscription
DeepSeek login in page displayed on smartphone
DeepSeek R1 just got even smarter with a new upgrade — here's what's changed
ChatGPT logo on phone
I just tested ChatGPT-4o's enhanced image generator with 7 prompts — here's the results
Bill Gates in 2019
Bill Gates just predicted the death of every job thanks to AI — except for these three
Latest in Features
Rusty pruning shears
How to spring clean your gardening tools — two household ingredients that really work
Wordle answer for #1,244, Thursday, November 14
I used ChatGPT to help me win at Wordle — here's what happened
a photo of a woman with strong abdominal muscles
Forget the Reformer —5 pilates exercises you can do with just a resistance band
A hand feels the temperature regulation of the SPRINGSPIRIT Dual Layer Mattress Topper.
What is a bamboo mattress topper and should you buy one?
2025 Mini Cooper Countryman SE All4 review.
I drove the Mini Cooper Countryman EV for a week — here’s my pros and cons
Troubadour Apex 3.0 Backpack
I tested this laptop backpack for 6 months — and it’s one of the best purchases I’ve ever made