Microsoft reveals Phi-3.5 — this new small AI model outperforms Gemini and GPT-4o

The Microsoft logo on a sign at the company's Redmond, Washington, headquarters.
(Image credit: VDB Photos/Shutterstock)

Microsoft has published the latest version of its small language model Phi-3.5. This new version is a big upgrade on the previous generation, beating smaller models from leading players like Google, OpenAI, Mistral, and Meta on several important metrics.

Phi-3.5 comes in 3.8 billion, 4.15 billion, and 41.9 billion parameter versions. All three are available to download for free and can be run using a local tool like Ollama.

It performed particularly well at reasoning, only being beaten by GPT-4o-mini out of the leading small models. It also did well on math benchmarks, significantly passing Llama and Gemini.

Small language models like Phi-3.5 demonstrate efficiency improvements in AI and add credence to OpenAI CEO Sam Altman's goal of creating intelligence too cheap to meter.

What’s new in Phi-3.5

Phi-3.5 comes in a vision model version that can understand images and not just text, as well as a mixture of expert models to split learning tasks across different sub-networks for more efficient processing.

The mixture of expert models beats Gemini Flash 1.5, which is the model used in the free version of the Gemini chatbot on multiple benchmarks and has a large 128k context window. While this is significantly smaller than Gemini itself, it is equal to ChatGPT and Claude.

The main benefit of a very small model like the one I installed is that it could be bundled with an application or even installed on an Internet of Things device such as a smart doorbell. This would allow for facial recognition without sending data to the cloud.

The smallest model was trained on 3.4 trillion tokens of data using 512 Nvidia H100 GPUs over 10 days. The mixture of expert models comprised 16 3.8b parameter models, used 4.9 trillion tokens and took 23 days to train.

How well does Phi-3.5 actually work?

I installed and ran the smaller 3.8 billion parameter version of Phi-3.5 on my laptop and found it less impressive than the benchmarks suggest. While it was verbose in its responses, often the phrasing left a lot to be desired, and it struggled with some simple tests.

I asked it a classic: “Write a short one-sentence story where the first letter of a word is the same as the last letter of the previous word.” Even after clarification, it failed spectacularly.

I haven’t tried the larger mixture of expert models. However, I’m told that judging by the benchmarks, it solves some of the issues with the version of the model I tried. The benchmarks suggest its output will be of similar quality to OpenAI’s GPT-4o-mini, the version that comes with the free version of ChatGPT.

One area that seems to outperform GPT-4o-mini above others is in STEM and social sciences areas. Its architecture allows it to maintain efficiency while managing complex AI tasks in different languages.

More from Tom's Guide

Category
Arrow
Arrow
Back to MacBook Air
Brand
Arrow
Processor
Arrow
RAM
Arrow
Storage Size
Arrow
Screen Size
Arrow
Colour
Arrow
Storage Type
Arrow
Condition
Arrow
Price
Arrow
Any Price
Showing 10 of 91 deals
Filters
Arrow
Load more deals
Ryan Morrison
AI Editor

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on artificial intelligence and technology speak for him than engage in this self-aggrandising exercise. As the AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover. When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing. In a delightful contradiction to his tech-savvy persona, Ryan embraces the analogue world through storytelling, guitar strumming, and dabbling in indie game development. Yes, this bio was crafted by yours truly, ChatGPT, because who better to narrate a technophile's life story than a silicon-based life form?

  • Araki
    Nothing's exciting about it. It is censored beyond repair, even finetuning can't fix it. The previous iteration straight-up did not contain any unsafe content in the dataset, consisting of toddler-friendly AI-generated textbooks. For this iteration, I'm certain they went even further and yet again wasted Earth's energy resources on training another set of infantile LLMs.
    Reply