VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning With Verified Rewards In World Simulators - Takara TLDR

Vision-Language-Action (VLA) models enable embodied decision-making but rely
heavily on imitation learning, leading to compounding errors and poor
robustness under distribution shift. Reinforcement learning (RL) can mitigate
these issues yet typically demands costly real-world interactions or suffers
from sim-to-real gaps. We introduce VLA-RFT, a reinforcement fine-tuning
framework that leverages a data-driven world model as a controllable simulator.
Trained from real interaction data, the simulator predicts future visual
observations conditioned on actions, allowing policy rollouts with dense,
trajectory-level rewards derived from goal-achieving references. This design
delivers an efficient and action-aligned learning signal, drastically lowering
sample requirements. With fewer than 400 fine-tuning steps, VLA-RFT surpasses
strong supervised baselines and achieves greater efficiency than
simulator-based RL. Moreover, it exhibits strong robustness under perturbed
conditions, sustaining stable task execution. Our results establish
world-model-based RFT as a practical post-training paradigm to enhance the
generalization and robustness of VLA models. For more details, please refer to
https://vla-rft.github.io/.

Source link

What's Hot

OpenAI finalizes deal that values it at $500 billion

A new a16z report looks at which AI companies startups are actually paying for

The Economist generative AI zero-click search

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators – Takara TLDR

Rethinking Reward Models for Multi-Domain Test-Time Scaling – Takara TLDR

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum – Takara TLDR

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness – Takara TLDR

Italian police seize 21 suspected forgeries attributed to Dalí

Sotheby’s Sells York Avenue HQ to Weill Cornell, Prepares Breuer Move

Outsider Art Fair’s New Director Elizabeth Denny Discusses Her Role

50 Pianos Sound Off in ’11,000 Strings’ at the Park Avenue Armory

OpenAI finalizes deal that values it at $500 billion

A new a16z report looks at which AI companies startups are actually paying for

The Economist generative AI zero-click search

What's Hot

VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators – Takara TLDR

Related Posts

Subscribe to Updates