Paper Page - BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

A large-scale dataset and verification tool are introduced for assessing and improving cross-disciplinary reasoning capabilities in multimodal models.

In this paper, we introduce BMMR, a large-scale bilingual, multimodal,
multi-disciplinary reasoning dataset for the community to develop and evaluate
large multimodal models (LMMs). BMMR comprises 110k college-level questions
spanning 300 UNESCO-defined subjects, spanning diverse formats-multiple-choice,
fill-in-the-blank, and open-ended QA-and sourced from both print and digital
media such as books, exams, and quizzes. All data are curated and filtered via
a human-in-the-loop and scalable framework, and each instance is paired with a
high-quality reasoning path. The dataset is organized into two parts: BMMR-Eval
that comprises 20,458 high-quality instances to comprehensively assess LMMs’
knowledge and reasoning across multiple disciplines in both Chinese and
English; and BMMR-Train that contains 88,991 instances to support further
research and development, extending the current focus on mathematical reasoning
to diverse disciplines and domains. In addition, we propose the process-based
multi-discipline verifier (i.e., BMMR-Verifier) for accurate and fine-grained
evaluation of reasoning paths. Extensive experiments on 24 models reveal that
(i) even SOTA models (e.g., o3 and Gemini-2.5-Pro) leave substantial headroom
on BMMR-Eval; (ii) reasoning models exhibit discipline bias and outperform LMMs
only on specific subjects; (iii) open-source models still trail their
proprietary counterparts; and (iv) fine-tuning on BMMR-Train narrows this gap.
Additionally, we conduct reasoning-chain analyses using BMMR-Verifier and other
in-depth studies, uncovering the challenges LMMs currently face in
multidisciplinary reasoning. We will release the data, and we hope our work can
offer insights and contributions to the community.

Source link

What's Hot

Amazon Music’s new AI feature generates personalized playlists every Monday

AI Made Her a Better Mom. so She Vibe-Coded a Web App for Others.

Sam Altman says that bots are making social media feel ‘fake’

Paper page – BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Why Language Models Hallucinate – Takara TLDR

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning – Takara TLDR

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation – Takara TLDR

Storied Collector and MoMA Trustee Dies at 92

Congress Obtains Drawing Trump Apparently Made for Jeffrey Epstein

New Banksy Work at London’s Royal Courts Immediately Covered Up

John Pritzker Donates 188 Dada and Surrealist Works to the Met Museum

Amazon Music’s new AI feature generates personalized playlists every Monday

AI Made Her a Better Mom. so She Vibe-Coded a Web App for Others.

Sam Altman says that bots are making social media feel ‘fake’

What's Hot

Paper page – BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset

Related Posts

Subscribe to Updates