Paper Page - Sailing AI By The Stars: A Survey Of Learning From Rewards In Post-Training And Test-Time Scaling Of Large Language Models

Recent developments in Large Language Models (LLMs) have shifted from pre-training scaling to post-training and test-time scaling. Across these developments, a key unified paradigm has arisen: Learning from Rewards, where reward signals act as the guiding stars to steer LLM behavior. It has underpinned a wide range of prevalent techniques, such as reinforcement learning (in RLHF, DPO, and GRPO), reward-guided decoding, and post-hoc correction. Crucially, this paradigm enables the transition from passive learning from static data to active learning from dynamic feedback. This endows LLMs with aligned preferences and deep reasoning capabilities. In this survey, we present a comprehensive overview of the paradigm of learning from rewards. We categorize and analyze the strategies under this paradigm across training, inference, and post-inference stages. We further discuss the benchmarks for reward models and the primary applications. Finally we highlight the challenges and future directions. We maintain a paper collection at https://github.com/bobxwu/learning-from-rewards-llm-papers.

Source link

What's Hot

InfiniHuman: Infinite 3D Human Creation with Precise Control – Takara TLDR

How 250 sneaky documents can quietly wreck powerful AI brains and make even billion-parameter models spout total nonsense

OpenAI Teases Option to Create ‘Erotica for Adults’ Using ChatGPT

Paper page – Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models

InfiniHuman: Infinite 3D Human Creation with Precise Control – Takara TLDR

Diffusion Transformers with Representation Autoencoders – Takara TLDR

QeRL: Beyond Efficiency — Quantization-enhanced Reinforcement Learning for LLMs – Takara TLDR

Egyptian Archaeologists Discover Large New Kingdom Military Fortress

Joan Weinstein to Head Vice President for Getty-Wide Program Planning

India Plots First Venice Biennale Pavilion in Seven Years

Massive Moai Statues Once ‘Walked’ to Their Platforms on Easter Island

InfiniHuman: Infinite 3D Human Creation with Precise Control – Takara TLDR

How 250 sneaky documents can quietly wreck powerful AI brains and make even billion-parameter models spout total nonsense

OpenAI Teases Option to Create ‘Erotica for Adults’ Using ChatGPT

What's Hot

Paper page – Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models

Related Posts

Subscribe to Updates