RewardMap: Tackling Sparse Rewards In Fine-grained Visual Reasoning Via Multi-Stage Reinforcement Learning - Takara TLDR

Fine-grained visual reasoning remains a core challenge for multimodal large
language models (MLLMs). The recently introduced ReasonMap highlights this gap
by showing that even advanced MLLMs struggle with spatial reasoning in
structured and information-rich settings such as transit maps, a task of clear
practical and scientific importance. However, standard reinforcement learning
(RL) on such tasks is impeded by sparse rewards and unstable optimization. To
address this, we first construct ReasonMap-Plus, an extended dataset that
introduces dense reward signals through Visual Question Answering (VQA) tasks,
enabling effective cold-start training of fine-grained visual understanding
skills. Next, we propose RewardMap, a multi-stage RL framework designed to
improve both visual understanding and reasoning capabilities of MLLMs.
RewardMap incorporates two key designs. First, we introduce a difficulty-aware
reward design that incorporates detail rewards, directly tackling the sparse
rewards while providing richer supervision. Second, we propose a multi-stage RL
scheme that bootstraps training from simple perception to complex reasoning
tasks, offering a more effective cold-start strategy than conventional
Supervised Fine-Tuning (SFT). Experiments on ReasonMap and ReasonMap-Plus
demonstrate that each component of RewardMap contributes to consistent
performance gains, while their combination yields the best results. Moreover,
models trained with RewardMap achieve an average improvement of 3.47% across 6
benchmarks spanning spatial reasoning, fine-grained visual reasoning, and
general tasks beyond transit maps, underscoring enhanced visual understanding
and reasoning capabilities.

Source link

What's Hot

9 Features Of Perplexity AI Every AI User Should Try – Trak.in

Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation – Takara TLDR

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning – Takara TLDR

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning – Takara TLDR

Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation – Takara TLDR

ExGRPO: Learning to Reason from Experience – Takara TLDR

The Unreasonable Effectiveness of Scaling Agents for Computer Use – Takara TLDR

New Archaeological Research Reveals Life in Pompeii Post-Eruption

Director Fired After Declining to Give Trump Sword for King Charles

Statue of Trump and Epstein Holding Hands Returns to Washington, D.C.

Glenn Lowry Sets His Sights on the Middle East After Departing MoMA

9 Features Of Perplexity AI Every AI User Should Try – Trak.in

Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation – Takara TLDR