Benchmarking Multimodal LLMs On Recognition And Understanding Over Chemical Tables

arXiv:2506.11375v1 Announce Type: new
Abstract: Chemical tables encode complex experimental knowledge through symbolic expressions, structured variables, and embedded molecular graphics. Existing benchmarks largely overlook this multimodal and domain-specific complexity, limiting the ability of multimodal large language models to support scientific understanding in chemistry. In this work, we introduce ChemTable, a large-scale benchmark of real-world chemical tables curated from the experimental sections of literature. ChemTable includes expert-annotated cell polygons, logical layouts, and domain-specific labels, including reagents, catalysts, yields, and graphical components and supports two core tasks: (1) Table Recognition, covering structure parsing and content extraction; and (2) Table Understanding, encompassing both descriptive and reasoning-oriented question answering grounded in table structure and domain semantics. We evaluated a range of representative multimodal models, including both open-source and closed-source models, on ChemTable and reported a series of findings with practical and conceptual insights. Although models show reasonable performance on basic layout parsing, they exhibit substantial limitations on both descriptive and inferential QA tasks compared to human performance, and we observe significant performance gaps between open-source and closed-source models across multiple dimensions. These results underscore the challenges of chemistry-aware table understanding and position ChemTable as a rigorous and realistic benchmark for advancing scientific reasoning.

Source link

What's Hot

TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments – Takara TLDR

DeepSeek AI Models Are Unsafe and Unreliable, Finds NIST-Backed Study

MIT arrests 10 in Istanbul operation targeting organized cybercrime

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Former ARTnews Publisher Dies at 97

National Gallery of Art Closes as a Result of Government Shutdown

Almine Rech Closes London Gallery After More Than a Decade

Record Exec and Art Collector Gets Over 4 Years

TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments – Takara TLDR

DeepSeek AI Models Are Unsafe and Unreliable, Finds NIST-Backed Study

MIT arrests 10 in Istanbul operation targeting organized cybercrime

What's Hot

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

Related Posts

Subscribe to Updates