Paper Page - MM-PRM: Enhancing Multimodal Mathematical Reasoning With Scalable Step-Level Supervision

While Multimodal Large Language Models (MLLMs) have achieved impressive
progress in vision-language understanding, they still struggle with complex
multi-step reasoning, often producing logically inconsistent or partially
correct solutions. A key limitation lies in the lack of fine-grained
supervision over intermediate reasoning steps. To address this, we propose
MM-PRM, a process reward model trained within a fully automated, scalable
framework. We first build MM-Policy, a strong multimodal model trained on
diverse mathematical reasoning data. Then, we construct MM-K12, a curated
dataset of 10,000 multimodal math problems with verifiable answers, which
serves as seed data. Leveraging a Monte Carlo Tree Search (MCTS)-based
pipeline, we generate over 700k step-level annotations without human labeling.
The resulting PRM is used to score candidate reasoning paths in the Best-of-N
inference setup and achieves significant improvements across both in-domain
(MM-K12 test set) and out-of-domain (OlympiadBench, MathVista, etc.)
benchmarks. Further analysis confirms the effectiveness of soft labels, smaller
learning rates, and path diversity in optimizing PRM performance. MM-PRM
demonstrates that process supervision is a powerful tool for enhancing the
logical robustness of multimodal reasoning systems. We release all our codes
and data at https://github.com/ModalMinds/MM-PRM.

Source link

What's Hot

Are bad incentives to blame for AI hallucinations?

Build Hour: Codex

C3.ai, Inc. (AI) Hit With Securities Class Action After Shares Crash 25% On Large Revenue Miss … | News

Paper page – MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Why Language Models Hallucinate – Takara TLDR

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning – Takara TLDR

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation – Takara TLDR

Storied Collector and MoMA Trustee Dies at 92

Congress Obtains Drawing Trump Apparently Made for Jeffrey Epstein

Galerie Gmurzynska Slated to Open in New York’s Fuller Building

New Banksy Work at London’s Royal Courts Immediately Covered Up

Are bad incentives to blame for AI hallucinations?

Build Hour: Codex

C3.ai, Inc. (AI) Hit With Securities Class Action After Shares Crash 25% On Large Revenue Miss … | News

What's Hot

Paper page – MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Related Posts

Subscribe to Updates