Mitigating Overthinking Through Reasoning Shaping - Takara TLDR

Large reasoning models (LRMs) boosted by Reinforcement Learning from Verifier
Reward (RLVR) have shown great power in problem solving, yet they often cause
overthinking: excessive, meandering reasoning that inflates computational cost.
Prior designs of penalization in RLVR manage to reduce token consumption while
often harming model performance, which arises from the oversimplicity of
token-level supervision. In this paper, we argue that the granularity of
supervision plays a crucial role in balancing efficiency and accuracy, and
propose Group Relative Segment Penalization (GRSP), a step-level method to
regularize reasoning. Since preliminary analyses show that reasoning segments
are strongly correlated with token consumption and model performance, we design
a length-aware weighting mechanism across segment clusters. Extensive
experiments demonstrate that GRSP achieves superior token efficiency without
heavily compromising accuracy, especially the advantages with harder problems.
Moreover, GRSP stabilizes RL training and scales effectively across model
sizes.

Source link

What's Hot

C3.ai and DigitalOcean Shares Skyrocket, What You Need To Know

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics – Takara TLDR

Transforming the physical world with AI: the next frontier in intelligent automation

Mitigating Overthinking through Reasoning Shaping – Takara TLDR

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics – Takara TLDR

AutoPR: Let’s Automate Your Academic Promotion! – Takara TLDR

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control – Takara TLDR

Egyptian Archaeologists Discover Large New Kingdom Military Fortress

Artist Behind Canterbury Cathedral Art Responds to JD Vance, Elon Musk

Jenkins Johnson Gallery to Open Tribeca Outpost on Marian Goodman Gallery’s Third Floor

Ruth Asawa May Have Broken Record at MoMA—and More Art News

C3.ai and DigitalOcean Shares Skyrocket, What You Need To Know

StatEval: A Comprehensive Benchmark for Large Language Models in Statistics – Takara TLDR

Transforming the physical world with AI: the next frontier in intelligent automation

What's Hot

Mitigating Overthinking through Reasoning Shaping – Takara TLDR

Related Posts

Subscribe to Updates