Paper Page - SciVer: Evaluating Foundation Models For Multimodal Scientific Claim Verification

A benchmark named SciVer evaluates multimodal foundation models’ claim verification capabilities within scientific contexts, revealing performance gaps and limitations in current models.

We introduce SciVer, the first benchmark specifically designed to evaluate
the ability of foundation models to verify claims within a multimodal
scientific context. SciVer consists of 3,000 expert-annotated examples over
1,113 scientific papers, covering four subsets, each representing a common
reasoning type in multimodal scientific claim verification. To enable
fine-grained evaluation, each example includes expert-annotated supporting
evidence. We assess the performance of 21 state-of-the-art multimodal
foundation models, including o4-mini, Gemini-2.5-Flash, Llama-3.2-Vision, and
Qwen2.5-VL. Our experiment reveals a substantial performance gap between these
models and human experts on SciVer. Through an in-depth analysis of
retrieval-augmented generation (RAG), and human-conducted error evaluations, we
identify critical limitations in current open-source models, offering key
insights to advance models’ comprehension and reasoning in multimodal
scientific literature tasks.

Source link

What's Hot

ROI Lessons for In-House Counsel – Artificial Lawyer

Paper2Video: Automatic Video Generation from Scientific Papers – Takara TLDR

Hcltech Joins Mit Media Lab in the Us to Collaborate on Next-gen Ai Research

Paper page – SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Paper2Video: Automatic Video Generation from Scientific Papers – Takara TLDR

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents – Takara TLDR

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation – Takara TLDR

Tomb of Amenhotep III Reopens After Two-Decade Renovation

Limited Edition Print of Ozzy Osbourne Art Sold To Benefit Charities

Odili Donald Odita Sues Jack Shainman Gallery over ‘Withheld’ Artworks

Mohamed Hamidi, Moroccan Modernist Painter, Has Died at 84

ROI Lessons for In-House Counsel – Artificial Lawyer

Paper2Video: Automatic Video Generation from Scientific Papers – Takara TLDR

Hcltech Joins Mit Media Lab in the Us to Collaborate on Next-gen Ai Research

What's Hot

Paper page – SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Related Posts

Subscribe to Updates