Apple reveals MM1 AI model and it could power the new Siri 2.0

Siri presenting 'Go ahead, I'm listening' in text on iPhone screen.
(Image credit: Shutterstock)

Apple is something of a latecomer to the large language model (LLM) scene, lagging behind Google, Microsoft and Meta in creating powerful AI tools, but it seems to be catching up quickly.

Earlier this year CEO Tim Cook told investors that there would be a significant announcement around AI that was a “major breakthrough”. Many suspect this will be a new version of Siri powered by an LLM similar to Google’s replacing Assistant with Gemini.

Apple researchers have just revealed details of what could be the basis of this next-generation Siri, and if rumors are true, could work alongside Gemini on the iPhone offering a choice.

Released as a preprint research paper, MM1 essentially offers a new method for using AI-generated data and labels to speed up training of new models — including possibly Siri 2.0.

What is Apple MM1?

At the core MM1 is a new method for training multimodal models using synthetic data including images and text. 

The researchers behind MM1 claim their new method speeds up performance and reduces the number of follow up prompts to get a desired result. 

Being able to improve prompt understanding and get to the desired output with as little interaction with the AI as possible is perfect for consumer tech, especially in Siri which will be used by a wide group of people with varying degrees of technological prowess.

The models achieve state-of-the-art pre-training metrics and competitive performance on multimodal benchmarks after fine-tuning.

MM1 seems to be a family of AI models, with the largest around 30 billion parameters. This is significantly smaller than the trillion plus parameters in GPT-4 and Claude 3 Opus but the researchers still claim to match key benchmarks due to improvements in efficiency.

“By scaling up their recipe, they built MM1, a family of multimodal models up to 30B parameters that achieve state-of-the-art pre-training metrics and competitive performance on multimodal benchmarks after fine-tuning,” they wrote.

The significant breakthrough is in vision, specifically analysis of images and other forms of visual content and the ability to understand the output. I recently tested how well ChatGPT, Claude and Gemini perform at this task.

How does Apple MM1 work?

Apple MM1

(Image credit: Apple)

The full title of the paper is Methods, Analysis and Insights from Multimodal LLM Pre-training. It was quietly released with minimal fanfare and available open source with full details of training data and benchmarks. 

In it researchers argue that combining different types of training data and model architectures — instead of relying on a single concept — can lead to state-of-the-art performance. 

The team wrote that they used a mix of image-caption, image-text and text only data and that a "diverse dataset spanning visual and linguistic information" is required to get that performance. 

This includes image captioning, visual question answering and natural language understanding — such as for one-shot or few-shot prompts to get a desired output.

“Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting,” the team explained.

What makes Apple MM1 different?

MM1 uses a different type of architecture to toher models including higher image resolution encoders, takes a different approach to pre-training and labelling and focuses on using that data mix to improve overall performance from a single prompt. 

It also uses a mixture-of-experts (MoE) model to scale up while keeping the processing requirements down, which further hints at its potential use on devices like iPhones or laptops, rather than running in the cloud.

Google recently leveraged a MoE architecture in its Gemini 1.5 Pro model with a more than one million token context window. This allowed it to improve efficiency over singificant input data.

Will Apple MM1 power Siri 2.0?

Apple could bring Google Gemini to iPhone

(Image credit: Apple/Google)

While the paper doesn’t mention Siri or any potential product, the focus on performance and efficiency, achieving solid results with minimal prompting and the need for extensive multimodal capabilities does hint at the direction Apple will go with Siri in the future.

It is likely that many of the features of any LLM-powered Siri will have to run “on device”, particularly around processing personal information due to Apple’s longstanding privacy stance.

Being able to develop a very powerful model, capable of learning from interactions with users and that is small enough to run on an iPhone is a big move.

With the recent news that Apple may be bringing Gemini to the iPhone, and previous remors that the company is also in talks with ChatGPT maker OpenAI, it looks like Apple is taking a multi-faceted approach to achieving the “big bang” Cook promised investors in AI.

More from Tom's Guide

Category
Arrow
Arrow
Back to Mobile Cell Phones
Storage Size
Arrow
Colour
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 121 deals
Filters
Arrow
Our Review
1
Apple iPhone 15 Pro Max 256GB
Verizon
Our Review
2
Apple iPhone 14 Plus - 256GB...
AT&T Mobility
(Blue)
Our Review
3
Apple iPhone 15 Pro 128 GB in...
Visible
(256GB Blue)
Our Review
4
iPhone 15 Pro Max 256GB (with...
Straight Talk
(Blue)
Our Review
5
Apple iPhone 15 Pro Max 256GB...
Total Wireless
Our Review
6
Apple iPhone 15 128 GB in...
Verizon
Our Review
7
Apple iPhone 15 Plus 128GB
Verizon
Our Review
8
Apple iPhone 15 Pro 128GB
Verizon
Our Review
9
Apple iPhone 15 Pro Max 512GB
Verizon
Our Review
10
Apple iPhone 15 128 GB in...
Verizon
Show more
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

Read more
Apple Intelligence logo on iPhone
Apple Intelligence could get Gemini alongside ChatGPT — here's why that's a big deal
Apple Intelligence logo on iPhone with Apple logo in background
Leaked memo reveals Apple’s AI plans for 2025 — this is what the company is focusing on
Apple Intelligence logo on iPhone
Apple Intelligence — everything you need to know about Apple's AI
Gemini screenshot image
Google unveils Gemini 2.5 — claims AI breakthrough with enhanced reasoning and multimodal power
Siri presenting 'Go ahead, I'm listening' in text on iPhone screen.
Siri 2.0 isn't ready for the limelight as Apple runs into bugs and delays
Apple Intelligence on an iPhone screen
Apple analysts sound alarm on Siri delay — here’s why
Latest in AI
Microsoft Copilot app running on a phone with Microsoft logo in background
Microsoft 365 Copilot debuts new research tools for work: here's what that means
AI Mode of google search
Google’s making it easier to start new AI Mode searches — here’s how
Gemini logo on smartphone
Google Gemini Gems now available to all users without a subscription
DeepSeek login in page displayed on smartphone
DeepSeek R1 just got even smarter with a new upgrade — here's what's changed
ChatGPT logo on phone
I just tested ChatGPT-4o's enhanced image generator with 7 prompts — here's the results
Bill Gates in 2019
Bill Gates just predicted the death of every job thanks to AI — except for these three
Latest in News
The Signal app logo displayed on an iPhone, with a screenshot of the Signal app in use displayed on a monitor in the background.
Signal — everything you need to know about the app at the center of the group chat scandal
Robert Downey Jr. revealed as Doctor Doom for "Avengers: Doomsday"
Marvel reveals 'Avengers: Doomsday' casting — follow the latest updates live
Wyze Cam v3
Wyze adds AI-powered filter to its security cameras to cut down on notifications that are “no big deal”
Mark Grayson (Steven Yeun) as Invincible in his blue suit during a scene from "Invincible" season 3 on Prime Video.
'Invincible' season 4 release window just announced — here's when it's coming
Microsoft Copilot app running on a phone with Microsoft logo in background
Microsoft 365 Copilot debuts new research tools for work: here's what that means
COLUMBUS, OHIO - JANUARY 26: Amber Glenn skates in the Women's Free Skate during the U.S. Figure Skating Championships at Nationwide Arena on January 26, 2024 in Columbus, Ohio. (Photo by Matthew Stockman/Getty Images)
Watch World Figure Skating Championships 2025 online – live stream, schedule, what TV channel is it on?