Mitigating Overthinking Through Reasoning Shaping - Takara TLDR

Large reasoning models (LRMs) boosted by Reinforcement Learning from Verifier
Reward (RLVR) have shown great power in problem solving, yet they often cause
overthinking: excessive, meandering reasoning that inflates computational cost.
Prior designs of penalization in RLVR manage to reduce token consumption while
often harming model performance, which arises from the oversimplicity of
token-level supervision. In this paper, we argue that the granularity of
supervision plays a crucial role in balancing efficiency and accuracy, and
propose Group Relative Segment Penalization (GRSP), a step-level method to
regularize reasoning. Since preliminary analyses show that reasoning segments
are strongly correlated with token consumption and model performance, we design
a length-aware weighting mechanism across segment clusters. Extensive
experiments demonstrate that GRSP achieves superior token efficiency without
heavily compromising accuracy, especially the advantages with harder problems.
Moreover, GRSP stabilizes RL training and scales effectively across model
sizes.

Source link

What's Hot

Nvidia Contributes Vera Rubin Rack Innovations to OCP Community

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval – Takara TLDR

Connect Amazon Quick Suite to enterprise apps and agents with MCP

Mitigating Overthinking through Reasoning Shaping – Takara TLDR

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval – Takara TLDR

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics – Takara TLDR

AutoPR: Let’s Automate Your Academic Promotion! – Takara TLDR

Egyptian Archaeologists Discover Large New Kingdom Military Fortress

Joan Weinstein to Head Vice President for Getty-Wide Program Planning

Artist Behind Canterbury Cathedral Art Responds to JD Vance, Elon Musk

Jenkins Johnson Gallery to Open Tribeca Outpost on Marian Goodman Gallery’s Third Floor

Nvidia Contributes Vera Rubin Rack Innovations to OCP Community

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval – Takara TLDR

Connect Amazon Quick Suite to enterprise apps and agents with MCP

What's Hot

Mitigating Overthinking through Reasoning Shaping – Takara TLDR

Related Posts

Subscribe to Updates