RewardMap: Tackling Sparse Rewards In Fine-grained Visual Reasoning Via Multi-Stage Reinforcement Learning - Takara TLDR

Fine-grained visual reasoning remains a core challenge for multimodal large
language models (MLLMs). The recently introduced ReasonMap highlights this gap
by showing that even advanced MLLMs struggle with spatial reasoning in
structured and information-rich settings such as transit maps, a task of clear
practical and scientific importance. However, standard reinforcement learning
(RL) on such tasks is impeded by sparse rewards and unstable optimization. To
address this, we first construct ReasonMap-Plus, an extended dataset that
introduces dense reward signals through Visual Question Answering (VQA) tasks,
enabling effective cold-start training of fine-grained visual understanding
skills. Next, we propose RewardMap, a multi-stage RL framework designed to
improve both visual understanding and reasoning capabilities of MLLMs.
RewardMap incorporates two key designs. First, we introduce a difficulty-aware
reward design that incorporates detail rewards, directly tackling the sparse
rewards while providing richer supervision. Second, we propose a multi-stage RL
scheme that bootstraps training from simple perception to complex reasoning
tasks, offering a more effective cold-start strategy than conventional
Supervised Fine-Tuning (SFT). Experiments on ReasonMap and ReasonMap-Plus
demonstrate that each component of RewardMap contributes to consistent
performance gains, while their combination yields the best results. Moreover,
models trained with RewardMap achieve an average improvement of 3.47% across 6
benchmarks spanning spatial reasoning, fine-grained visual reasoning, and
general tasks beyond transit maps, underscoring enhanced visual understanding
and reasoning capabilities.

Source link

What's Hot

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? – Takara TLDR

9 Features Of Perplexity AI Every AI User Should Try – Trak.in

Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation – Takara TLDR

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning – Takara TLDR

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? – Takara TLDR

Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation – Takara TLDR

ExGRPO: Learning to Reason from Experience – Takara TLDR

Record Exec and Art Collector Gets Over 4 Years

New Archaeological Research Reveals Life in Pompeii Post-Eruption

Director Fired After Declining to Give Trump Sword for King Charles

Statue of Trump and Epstein Holding Hands Returns to Washington, D.C.

StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? – Takara TLDR

9 Features Of Perplexity AI Every AI User Should Try – Trak.in

Drawing Conclusions from Draws: Rethinking Preference Semantics in Arena-Style LLM Evaluation – Takara TLDR

What's Hot

RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning – Takara TLDR

Related Posts

Subscribe to Updates