arXiv AI

From Seeing to Understanding to Reasoning in Vision-Language Models

By Advanced AI EditorJune 13, 2025No Comments2 Mins Read

[Submitted on 27 May 2025 (v1), last revised 11 Jun 2025 (this version, v2)]

View a PDF of the paper titled Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models, by Zesen Lyu and 5 other authors

View PDF

Abstract:Spatial reasoning is a core component of human cognition, enabling individuals to perceive, comprehend, and interact with the physical world. It relies on a nuanced understanding of spatial structures and inter-object relationships, serving as the foundation for complex reasoning and decision-making. To investigate whether current vision-language models (VLMs) exhibit similar capability, we introduce Jigsaw-Puzzles, a novel benchmark consisting of 1,100 carefully curated real-world images with high spatial complexity. Based on this dataset, we design five tasks to rigorously evaluate VLMs’ spatial perception, structural understanding, and reasoning capabilities, while deliberately minimizing reliance on domain-specific knowledge to better isolate and assess the general spatial reasoning capability. We conduct a comprehensive evaluation across 24 state-of-the-art VLMs. The results show that even the strongest model, Gemini-2.5-Pro, achieves only 77.14% overall accuracy and performs particularly poorly on the Order Generation task, with only 30.00% accuracy, far below the performance exceeding 90% achieved by human participants. This persistent gap underscores the need for continued progress, positioning Jigsaw-Puzzles as a challenging and diagnostic benchmark for advancing spatial reasoning research in VLMs. Our project page is at this https URL

Submission history

From: Zesen Lyu [view email]
[v1]
Tue, 27 May 2025 05:17:41 UTC (11,269 KB)
[v2]
Wed, 11 Jun 2025 07:50:21 UTC (5,666 KB)

Previous ArticleStanford HAI’s 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

Next Article Google, OpenAI, Spotify and other platforms experience outage, affecting tens of thousands of users

Advanced AI Editor

Leave A Reply