Microsoft reveals Phi-3.5 — this new small AI model outperforms Gemini and GPT-4o

(Image credit: VDB Photos/Shutterstock)

Microsoft has published the latest version of its small language model Phi-3.5. This new version is a big upgrade on the previous generation, beating smaller models from leading players like Google, OpenAI, Mistral, and Meta on several important metrics.

Phi-3.5 comes in 3.8 billion, 4.15 billion, and 41.9 billion parameter versions. All three are available to download for free and can be run using a local tool like Ollama.

It performed particularly well at reasoning, only being beaten by GPT-4o-mini out of the leading small models. It also did well on math benchmarks, significantly passing Llama and Gemini.

Small language models like Phi-3.5 demonstrate efficiency improvements in AI and add credence to OpenAI CEO Sam Altman's goal of creating intelligence too cheap to meter.

What’s new in Phi-3.5

🔥 New Phi-3.5 models are now on the Open LLM Leaderboard!• Phi-3.5-MoE-instruct leads all Microsoft models with a 35.1 average score, ranking 1st in the 3B category and 10th among all chat models• Phi-3.5-mini-instruct scored 27.4 points, taking 3rd place in the 3B category… pic.twitter.com/yNcOR2bcxXAugust 22, 2024

Phi-3.5 comes in a vision model version that can understand images and not just text, as well as a mixture of expert models to split learning tasks across different sub-networks for more efficient processing.

The mixture of expert models beats Gemini Flash 1.5, which is the model used in the free version of the Gemini chatbot on multiple benchmarks and has a large 128k context window. While this is significantly smaller than Gemini itself, it is equal to ChatGPT and Claude.

The main benefit of a very small model like the one I installed is that it could be bundled with an application or even installed on an Internet of Things device such as a smart doorbell. This would allow for facial recognition without sending data to the cloud.

The smallest model was trained on 3.4 trillion tokens of data using 512 Nvidia H100 GPUs over 10 days. The mixture of expert models comprised 16 3.8b parameter models, used 4.9 trillion tokens and took 23 days to train.

How well does Phi-3.5 actually work?

I installed and ran the smaller 3.8 billion parameter version of Phi-3.5 on my laptop and found it less impressive than the benchmarks suggest. While it was verbose in its responses, often the phrasing left a lot to be desired, and it struggled with some simple tests.

I asked it a classic: “Write a short one-sentence story where the first letter of a word is the same as the last letter of the previous word.” Even after clarification, it failed spectacularly.

I haven’t tried the larger mixture of expert models. However, I’m told that judging by the benchmarks, it solves some of the issues with the version of the model I tried. The benchmarks suggest its output will be of similar quality to OpenAI’s GPT-4o-mini, the version that comes with the free version of ChatGPT.

One area that seems to outperform GPT-4o-mini above others is in STEM and social sciences areas. Its architecture allows it to maintain efficiency while managing complex AI tasks in different languages.

More from Tom's Guide

Back to MacBook Air

Apple

Asus

Lenovo

128GB

1TB

Black

Grey

Silver

New

Refurbished

EMMC

SSD

Showing 10 of 16 deals

Filters☰

Asus Zenbook S 13 OLED

(OLED)

$1,399.99

View

Lenovo IdeaPad Duet 3

(128GB 8GB RAM)

$387.85

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Silver)

Our Review

☆☆☆☆☆

$1,799

$1,299

View

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Lenovo IdeaPad Duet 3

$369.99

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB SSD)

Our Review

☆☆☆☆☆

$1,799

$1,667.90

View

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Silver)

Our Review

☆☆☆☆☆

$2,024.95

View

Asus Zenbook S 13 OLED

(OLED)

$1,599

View

Apple MacBook Pro 14-inch M3 (2023)

(1TB Black)

Our Review

☆☆☆☆☆

$2,499.99

View

See more AI News

Ryan Morrison, a stalwart in the realm of tech journalism, possesses a sterling track record that spans over two decades, though he'd much rather let his insightful articles on AI and technology speak for him than engage in this self-aggrandising exercise. As the former AI Editor for Tom's Guide, Ryan wields his vast industry experience with a mix of scepticism and enthusiasm, unpacking the complexities of AI in a way that could almost make you forget about the impending robot takeover.
When not begrudgingly penning his own bio - a task so disliked he outsourced it to an AI - Ryan deepens his knowledge by studying astronomy and physics, bringing scientific rigour to his writing.