Apple is taking a new approach to training its AI models – one that avoids collecting or copying user content from iPhones or Macs.
According to a recent blog post, the company plans to continue to rely on synthetic data (constructed data that is used to mimic user behaviour) and differential privacy to improve features like email summaries, without gaining access to personal emails or messages.
For users who opt in to Apple’s Device Analytics program, the company’s AI models will compare synthetic email-like messages against a small sample of a real user’s content stored locally on the device. The device then identifies which of the synthetic messages most closely matches its user sample, and sends information about the selected match back to Apple. No actual user data leaves the device, and Apple says it receives only aggregated information.
The technique will allow Apple to improve its models for longer-form text generation tasks without collecting real user content. It’s an extension of the company’s long-standing use of differential privacy, which introduces randomised data into broader datasets to help protect individual identities. Apple has used this method since 2016 to understand use patterns, in line with the company’s safeguarding policies.
Improving Genmoji and other Apple Intelligence features
The company already uses differential privacy to improve features like Genmoji, where it collects general trends about which prompts are most popular without linking any prompt with a specific user or device. In upcoming releases, Apple plans to apply similar methods to other Apple Intelligence features, including Image Playground, Image Wand, Memories Creation, and Writing Tools.
For Genmoji, the company anonymously polls participating devices to determine whether specific prompt fragments have been seen. Each device responds with a noisy signal – some responses reflect actual use, while others are randomised. The approach ensures that only widely-used terms become visible to Apple, and no individual response can be traced back to a user or device, the company says.
Curating synthetic data for better email summaries
While the above method has worked well with respect to short prompts, Apple needed a new approach for more complex tasks like summarising emails. For this, Apple generates thousands of sample messages, and these synthetic messages are converted into numerical representations, or ’embeddings,’ based on language, tone, and topic. Participating user devices then compare the embeddings to locally stored samples. Again, only the selected match is shared, not the content itself.
Apple collects the most frequently-selected synthetic embeddings from participating devices and uses them to refine its training data. Over time, this process allows the system to generate more relevant and realistic synthetic emails, helping Apple to improve its AI outputs for summarisation and text generation without apparent compromise of user privacy.
Available in beta
Apple is rolling out the system in beta versions of iOS 18.5, iPadOS 18.5, and macOS 15.5. According to Bloomberg’s Mark Gurman, Apple is attempting to address challenges with its AI development in this way, problems which have included delayed feature rollouts and the fallout from leadership changes in the Siri team.
Whether its approach will yield more useful AI outputs in practice remains to be seen, but it signals a clear public effort to balance user privacy with model performance.
(Photo by Unsplash)
See also: ChatGPT got another viral moment with ‘AI action figure’ trend

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.