Paper Page - Magistral - Advanced AI News

Magistral, a scalable reinforcement learning pipeline, demonstrates that RL can enhance multimodal understanding and instruction following in large language models without requiring existing RL traces.

We introduce Magistral, Mistral’s first reasoning model and our own scalable
reinforcement learning (RL) pipeline. Instead of relying on existing
implementations and RL traces distilled from prior models, we follow a ground
up approach, relying solely on our own models and infrastructure. Notably, we
demonstrate a stack that enabled us to explore the limits of pure RL training
of LLMs, present a simple method to force the reasoning language of the model,
and show that RL on text data alone maintains most of the initial checkpoint’s
capabilities. We find that RL on text maintains or improves multimodal
understanding, instruction following and function calling. We present Magistral
Medium, trained for reasoning on top of Mistral Medium 3 with RL alone, and we
open-source Magistral Small (Apache 2.0) which further includes cold-start data
from Magistral Medium.

Source link

What's Hot

Thinking While Listening: Simple Test Time Scaling For Audio Classification – Takara TLDR

Career Corner | AI and your job search – Times-Standard

Google Gemini Nano Banana now on WhatsApp: Perplexity CEO Aravind Srinivas demonstrates how to generate AI images for free – Technology News

Paper page – Magistral

Thinking While Listening: Simple Test Time Scaling For Audio Classification – Takara TLDR

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning – Takara TLDR

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent – Takara TLDR

1 Comment

Judge Rejects Ronald Perelman’s $400 M. Art Insurance Claim

Drag Queen Alexis Stone Became the Mona Lisa for Milan Fashion Show

Steve McQueen’s Granddaughter Lawsuit for $68 M. Pollock Painting

Marina Abramović to Have Exhibition at Venice’s Accademia in 2026

Thinking While Listening: Simple Test Time Scaling For Audio Classification – Takara TLDR

Career Corner | AI and your job search – Times-Standard

Google Gemini Nano Banana now on WhatsApp: Perplexity CEO Aravind Srinivas demonstrates how to generate AI images for free – Technology News

What's Hot

Paper page – Magistral

Related Posts

1 Comment

Subscribe to Updates