On Predictability Of Reinforcement Learning Dynamics For Large Language Models - Takara TLDR

Recent advances in reasoning capabilities of large language models (LLMs) are
largely driven by reinforcement learning (RL), yet the underlying parameter
dynamics during RL training remain poorly understood. This work identifies two
fundamental properties of RL-induced parameter updates in LLMs: (1) Rank-1
Dominance, where the top singular subspace of the parameter update matrix
nearly fully determines reasoning improvements, recovering over 99\% of
performance gains; and (2) Rank-1 Linear Dynamics, where this dominant subspace
evolves linearly throughout training, enabling accurate prediction from early
checkpoints. Extensive experiments across 8 LLMs and 7 algorithms validate the
generalizability of these properties. More importantly, based on these
findings, we propose AlphaRL, a plug-in acceleration framework that
extrapolates the final parameter update using a short early training window,
achieving up to 2.5 speedup while retaining \textgreater 96\% of reasoning
performance without extra modules or hyperparameter tuning. This positions our
finding as a versatile and practical tool for large-scale RL, opening a path
toward principled, interpretable, and efficient training paradigm for LLMs.

Source link

What's Hot

OpenAI’s Sora Lets You Make Silly Videos of You and Your Friends

MIT offered funding boost in return for signing pledge with Trump administration

The military-startup complex: How Silicon Valley is reshaping the defense industry

On Predictability of Reinforcement Learning Dynamics for Large Language Models – Takara TLDR

In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning – Takara TLDR

Making, not Taking, the Best of N – Takara TLDR

GEM: A Gym for Agentic LLMs – Takara TLDR

Sotheby’s Sells York Avenue HQ to Weill Cornell, Prepares Breuer Move

Outsider Art Fair’s New Director Elizabeth Denny Discusses Her Role

50 Pianos Sound Off in ’11,000 Strings’ at the Park Avenue Armory

Five Arts and Culture Nonprofits Join NYC’s Cultural Institutions Group

OpenAI’s Sora Lets You Make Silly Videos of You and Your Friends

MIT offered funding boost in return for signing pledge with Trump administration

The military-startup complex: How Silicon Valley is reshaping the defense industry

What's Hot

On Predictability of Reinforcement Learning Dynamics for Large Language Models – Takara TLDR

Related Posts

Subscribe to Updates