TTRV: Test-Time Reinforcement Learning For Vision Language Models - Takara TLDR

Existing methods for extracting reward signals in Reinforcement Learning
typically rely on labeled data and dedicated training splits, a setup that
contrasts with how humans learn directly from their environment. In this work,
we propose TTRV to enhance vision language understanding by adapting the model
on the fly at inference time, without the need for any labeled data.
Concretely, we enhance the Group Relative Policy Optimization (GRPO) framework
by designing rewards based on the frequency of the base model’s output, while
inferring on each test sample multiple times. Further, we also propose to
control the diversity of the model’s output by simultaneously rewarding the
model for obtaining low entropy of the output empirical distribution. Our
approach delivers consistent gains across both object recognition and visual
question answering (VQA), with improvements of up to 52.4% and 29.8%,
respectively, and average boosts of 24.6% and 10.0% across 16
datasets.Remarkably, on image recognition, TTRV applied to InternVL 8B
surpasses GPT-4o by an average of 2.3% over 8 benchmarks, while remaining
highly competitive on VQA, demonstrating that test-time reinforcement learning
can match or exceed the strongest proprietary models. Finally, we find many
interesting properties of test-time RL for VLMs: for example, even in extremely
data-constrained scenarios, where adaptation is performed on a single randomly
chosen unlabeled test example, TTRV still yields non-trivial improvements of up
to 5.5% in recognition tasks.

Source link

What's Hot

Lucio, Lightbringer, Harvey, Jus Mundi, SpotDraft, LI UK + NY – Artificial Lawyer

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models – Takara TLDR

Alibaba’s Qwen Team Takes Off! Lin Junyang Leads the Charge as a Major Player Joins the Embodied Intelligence Arena_known_team_models

TTRV: Test-Time Reinforcement Learning for Vision Language Models – Takara TLDR

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models – Takara TLDR

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints – Takara TLDR

SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation – Takara TLDR

Frieze to Launch Abu Dhabi Fair in November 2026

Jeff Koons Returns to Gagosian with First New York Show in Seven Years

$45 M. Basquait Painting to Headline Sotheby’s Fall Sales in New York

Guggenheim’s 2026 Shows Include Carol Bove Survey, Taryn Simon Project

Lucio, Lightbringer, Harvey, Jus Mundi, SpotDraft, LI UK + NY – Artificial Lawyer

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models – Takara TLDR

Alibaba’s Qwen Team Takes Off! Lin Junyang Leads the Charge as a Major Player Joins the Embodied Intelligence Arena_known_team_models

What's Hot

TTRV: Test-Time Reinforcement Learning for Vision Language Models – Takara TLDR

Related Posts

Subscribe to Updates