Microsoft Launches Phi-4 Reasoning AI Models To Rival DeepSeek R1

Microsoft has launched three new AI reasoning models including Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. These are small language models, designed for edge devices like Windows PCs and mobile devices. The Phi-4-reasoning AI model is trained on 14 billion parameters and can perform complex reasoning tasks.

The Phi-4-reasoning-plus model uses the same base model, but it uses more inference-time compute, nearly 1.5x more tokens than Phi-4-reasoning to deliver higher accuracy. Despite being much smaller in size, Phi-4-reasoning models rival larger models such as DeepSeek R1 671B and o3-mini.

In the GPQA benchmark, Phi-4-reasoning-plus-14B model achieves 69.3% while the o3-mini scores 77.7%. Next, in the AIME 2025 test, Phi-4-reasoning-plus-14B gets 78%, and o3-mini achieves 82.5%. It goes on to show that Microsoft’s small model comes very close to flagship reasoning models, which are much larger in size.

phi 4 mini reasoning benchmark performance — Image Credit: Microsoft

Microsoft says Phi-4 reasoning models are trained via supervised fine-tuning “on carefully curated reasoning demonstrations from OpenAI o3-mini.” Further, Microsoft writes, “The model demonstrates that meticulous data curation and high-quality synthetic datasets allow smaller models to compete with larger counterparts.“

Apart from that, the smaller Phi-4-mini-reasoning model, trained on just 3.8B parameters, outperforms many 7B and 8B models. In benchmarks like AIME 24, MATH 500, and GPQA Diamond, the Phi-4-mini-reasoning-3.8B model delivers competitive scores, nearly matching o1-mini. The Phi-4-mini model has been “fine-tuned with synthetic data generated by Deepseek-R1 model.”

Microsoft’s Phi models are already being locally used on Windows Copilot+ PCs, and they leverage the built-in NPU. It will be interesting to see how the Phi-4 reasoning models improve the on-device AI performance.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Source link

What's Hot

Memory Retrieval and Consolidation in Large Language Models through Function Tokens – Takara TLDR

When You Tell AI Models to Act Like Women, Most Become More Risk-Averse: Study

Is vibe coding ruining a generation of engineers?

Microsoft Launches Phi-4 Reasoning AI Models to Rival DeepSeek R1

When You Tell AI Models to Act Like Women, Most Become More Risk-Averse: Study

Ant Group Launches Ling-1T: China’s Trillion-Parameter AI Model to Rival OpenAI and DeepSeek

New York-Based Reflection AI Raises $2B, Hits $8B Valuation

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues

Memory Retrieval and Consolidation in Large Language Models through Function Tokens – Takara TLDR

When You Tell AI Models to Act Like Women, Most Become More Risk-Averse: Study

Is vibe coding ruining a generation of engineers?

What's Hot

Microsoft Launches Phi-4 Reasoning AI Models to Rival DeepSeek R1

Arjun Sha

Related Posts

Subscribe to Updates