Paper Page - SciVer: Evaluating Foundation Models For Multimodal Scientific Claim Verification

A benchmark named SciVer evaluates multimodal foundation models’ claim verification capabilities within scientific contexts, revealing performance gaps and limitations in current models.

We introduce SciVer, the first benchmark specifically designed to evaluate
the ability of foundation models to verify claims within a multimodal
scientific context. SciVer consists of 3,000 expert-annotated examples over
1,113 scientific papers, covering four subsets, each representing a common
reasoning type in multimodal scientific claim verification. To enable
fine-grained evaluation, each example includes expert-annotated supporting
evidence. We assess the performance of 21 state-of-the-art multimodal
foundation models, including o4-mini, Gemini-2.5-Flash, Llama-3.2-Vision, and
Qwen2.5-VL. Our experiment reveals a substantial performance gap between these
models and human experts on SciVer. Through an in-depth analysis of
retrieval-augmented generation (RAG), and human-conducted error evaluations, we
identify critical limitations in current open-source models, offering key
insights to advance models’ comprehension and reasoning in multimodal
scientific literature tasks.

Source link

What's Hot

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents – Takara TLDR

OpenAI lets ChatGPT users connect with Spotify, Zillow in app – East Bay Times

OpenAI's DevDay 2025 preview: Will Sam Altman launch the ChatGPT browser?

Paper page – SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents – Takara TLDR

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation – Takara TLDR

Optimizing What Matters: AUC-Driven Learning for Robust Neural Retrieval – Takara TLDR

Tomb of Amenhotep III Reopens After Two-Decade Renovation

Limited Edition Print of Ozzy Osbourne Art Sold To Benefit Charities

Odili Donald Odita Sues Jack Shainman Gallery over ‘Withheld’ Artworks

Mohamed Hamidi, Moroccan Modernist Painter, Has Died at 84

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents – Takara TLDR

OpenAI lets ChatGPT users connect with Spotify, Zillow in app – East Bay Times

OpenAI's DevDay 2025 preview: Will Sam Altman launch the ChatGPT browser?

What's Hot

Paper page – SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Related Posts

Subscribe to Updates