Paper Page - SpatialScore: Towards Unified Evaluation For Multimodal Spatial Understanding

Project Page: https://haoningwu3639.github.io/SpatialScore/
Paper: https://arxiv.org/abs/2505.17012/
Code: https://github.com/haoningwu3639/SpatialScore/
Data: https://huggingface.co/datasets/haoningwu/SpatialScore

We are currently organizing our data and code, and expect to open-source them within 1-2 weeks! Please stay tuned! Feel free to reach out for discussions!

To summarize, we make the following contributions in this paper:
(i) we introduce VGBench, a benchmark specifically designed to assess MLLMs for visual geometry perception, e.g., camera pose and motion estimation;
(ii) we propose SpatialScore, the most comprehensive and diverse multimodal spatial understanding benchmark to date, integrating VGBench with relevant data from the other 11 existing datasets. This benchmark comprises 28K samples across various spatial understanding tasks, modalities, and QA formats, along with a carefully curated challenging subset, SpatialScore-Hard;
(iii) we develop SpatialAgent, a novel multi-agent system incorporating 9 specialized tools for spatial understanding, supporting both Plan-Execute and ReAct reasoning paradigms;
(iv) we conduct extensive evaluations to reveal persistent challenges in spatial reasoning while demonstrating the effectiveness of SpatialAgent.

Source link

What's Hot

Peak bubble – by Gary Marcus

Hunyuan-MT Technical Report – Takara TLDR

Chips, Politics, and Europe’s AI Ambitions

Paper page – SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Hunyuan-MT Technical Report – Takara TLDR

3D and 4D World Modeling: A Survey – Takara TLDR

EnvX: Agentize Everything with Agentic AI – Takara TLDR

National Gallery and Tate Have ‘Bad Blood’—and More Art News

Christie’s Will Auction The First Calculating Machine In History

The Art Market Isn’t Dying. The Way We Write About It Might Be.

Banksy Mural of Judge Beating Protestor Removed by Courts Service

Peak bubble – by Gary Marcus

Hunyuan-MT Technical Report – Takara TLDR

Chips, Politics, and Europe’s AI Ambitions

What's Hot

Paper page – SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Related Posts

Subscribe to Updates