Residual Off-Policy RL For Finetuning Behavior Cloning Policies - Takara TLDR

Recent advances in behavior cloning (BC) have enabled impressive visuomotor
control policies. However, these approaches are limited by the quality of human
demonstrations, the manual effort required for data collection, and the
diminishing returns from increasing offline data. In comparison, reinforcement
learning (RL) trains an agent through autonomous interaction with the
environment and has shown remarkable success in various domains. Still,
training RL policies directly on real-world robots remains challenging due to
sample inefficiency, safety concerns, and the difficulty of learning from
sparse rewards for long-horizon tasks, especially for high-degree-of-freedom
(DoF) systems. We present a recipe that combines the benefits of BC and RL
through a residual learning framework. Our approach leverages BC policies as
black-box bases and learns lightweight per-step residual corrections via
sample-efficient off-policy RL. We demonstrate that our method requires only
sparse binary reward signals and can effectively improve manipulation policies
on high-degree-of-freedom (DoF) systems in both simulation and the real world.
In particular, we demonstrate, to the best of our knowledge, the first
successful real-world RL training on a humanoid robot with dexterous hands. Our
results demonstrate state-of-the-art performance in various vision-based tasks,
pointing towards a practical pathway for deploying RL in the real world.
Project website: https://residual-offpolicy-rl.github.io

Source link

What's Hot

Elon Musk Is Fuming That Workers Keep Ditching His Company for OpenAI

Residual Off-Policy RL for Finetuning Behavior Cloning Policies – Takara TLDR

OpenAI CEO Sam Altman Suggests AI Could Automate 40% of Jobs by 2030

Residual Off-Policy RL for Finetuning Behavior Cloning Policies – Takara TLDR

Thinking While Listening: Simple Test Time Scaling For Audio Classification – Takara TLDR

UserRL: Training Interactive User-Centric Agent via Reinforcement Learning – Takara TLDR

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent – Takara TLDR

Judge Rejects Ronald Perelman’s $400 M. Art Insurance Claim

Drag Queen Alexis Stone Became the Mona Lisa for Milan Fashion Show

Steve McQueen’s Granddaughter Lawsuit for $68 M. Pollock Painting

Marina Abramović to Have Exhibition at Venice’s Accademia in 2026

Elon Musk Is Fuming That Workers Keep Ditching His Company for OpenAI