ShorterBetter: Guiding Reasoning Models To Find Optimal Inference Length For Efficient Reasoning

arXiv:2504.21370v1 Announce Type: new
Abstract: Reasoning models such as OpenAI o3 and DeepSeek-R1 have demonstrated strong performance on reasoning-intensive tasks through extended Chain-of-Thought (CoT) prompting. While longer reasoning traces can facilitate a more thorough exploration of solution paths for complex problems, researchers have observed that these models often “overthink”, leading to inefficient inference. In this paper, we introduce ShorterBetter, a simple yet effective reinforcement learning methed that enables reasoning language models to discover their own optimal CoT lengths without human intervention. By sampling multiple outputs per problem and defining the Sample Optimal Length (SOL) as the shortest correct response among all the outputs, our method dynamically guides the model toward optimal inference lengths. Applied to the DeepSeek-Distill-Qwen-1.5B model, ShorterBetter achieves up to an 80% reduction in output length on both in-domain and out-of-domain reasoning tasks while maintaining accuracy. Our analysis shows that overly long reasoning traces often reflect loss of reasoning direction, and thus suggests that the extended CoT produced by reasoning models is highly compressible.

Source link

What's Hot

New requirements for apps available in Texas – Latest News

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Underground Film Legend and Artist Dies at 92

Artwork Forfeited by Inigo Philbrick’s Partner Flops at Sotheby’s

New requirements for apps available in Texas – Latest News

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

What's Hot

ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Related Posts

Subscribe to Updates