Apple will soon train Apple Intelligence on select user data — here’s what you need to know

Apple Intelligence logo on iPhone
(Image credit: Future)

Apple has recently confirmed how it will start to use certain user data to help train its Apple Intelligence models.

There's little doubt that Apple Intelligence has had a few issues lately, including delaying its Siri 2.0 feature launch. To help avoid similar issues in the future, Apple is introducing a change regarding how it trains its AI. This change was detailed in a recent blog post from Apple’s Machine Learning Research website, via Bloomberg.

The blog details how Apple trains its AI using synthetic data, but this method has limitations. This is because synthetic data struggles to understand trends in features like Summarization or Writing Tools. However, the new method detailed by Apple aims to compare its synthetic data with user data to help solve this issue.

Apple's new training outline for its AI using select user data

(Image credit: Apple)

The process begins with Apple generating "a set of many emails that cover topics that are most common in messages. To curate a representative set of synthetic emails, we start by creating a large set of synthetic messages on a variety of topics." It is worth noting that Apple is adamant that these emails are not generated with any knowledge regarding individual user emails.

The said data is then derived into a representation, which is called an embedding, that captures some of the key information in the messages. This includes things like language, topic and length. These embeddings are then "sent to a small number of user devices that have opted in to Device Analytics." You can find more information on Device Analytics on Apple's website.

how to unsend an email in iOS 16 mail

(Image credit: Tom's Guide)

When users receive the data, they will then select a small number of their recent emails to compare them to. The device will then measure these selected emails against the embeddings to find which sample is closest. Apple will also use differential privacy to learn "the most-frequently selected synthetic embeddings across all devices, without learning which synthetic embedding was selected on any given device."

According to the blog, the "most-frequently selected synthetic embeddings can then be used to generate training or testing data, or we can run additional curation steps to further refine the dataset." This process will, according to Apple, allow them to "improve the topics and language of our synthetic emails, which helps us train our models to create better text outputs in features like email summaries, while protecting privacy."

Only time will tell if this will improve Apple Intelligence to the degree that is needed to help Apple catch up with the best AI chatbots and AI assistants. However, it will likely require more than just better training to compete with Gemini 2.0 or ChatGPT.

More from Tom's Guide

Josh Render
Staff Writer

Josh is a staff writer for Tom's Guide and is based in the UK. He has worked for several publications but now works primarily on mobile phones. Outside of phones, he has a passion for video games, novels, and Warhammer. 

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.