A Visual Theory-of-Mind Benchmark For Multimodal Large Language Models

[Submitted on 26 Aug 2024 (v1), last revised 9 May 2025 (this version, v2)]

View a PDF of the paper titled CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models, by Shubham Bharti and 7 other authors

View PDF
HTML (experimental)

Abstract:We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We detail the construction of the CHARTOM benchmark including its calibration on human performance. We benchmark leading LLMs as of late 2024 – including GPT, Claude, Gemini, Qwen, Llama, and Llava – on the CHARTOM dataset and found that our benchmark was challenging to all of them, suggesting room for future large language models to improve.

Submission history

From: Shubham Kumar Bharti [view email]
[v1]
Mon, 26 Aug 2024 17:04:23 UTC (1,774 KB)
[v2]
Fri, 9 May 2025 19:55:14 UTC (1,900 KB)

Source link

9 Comments

best payout online casino South africa on October 1, 2025 10:51 pm

aristocrat pokies real money united kingdom,
best payout online casino South africa non usa casino
sites and buy pokie machine usa, or free play online casino canada
goplayslots.net on October 2, 2025 12:35 am

play online poker united states, poker site usa and craps betting usa, or new
zealandn poker table

Look into my web blog – goplayslots.net
Marlys on October 4, 2025 12:44 am

usa casino providers, skrill united states gambling
and best free online slots canada, or northern lights casino play real cash (Marlys) uk
Morgan on October 6, 2025 5:32 pm

list of usa online casinos a best numbers to play on craps table (Morgan) z, can you play poker online for real
money in united kingdom and gambling trends usa, or gambling operating
licence uk
Fran on October 7, 2025 11:47 am

no deposit casino bonus codes cashable 2021 usa, united states roulette
rules and united kingdom best online casino, or
no wagering tulalip casino free play coupons (Fran) uk
is the vee quiva casino Open on October 9, 2025 7:05 am

5dollar deposit is the vee quiva casino Open united states, top casino slots
uk and united kingdom online pokies 2021, or bet365 play united statesn roulette online uk
wettbüro osnabrück on October 9, 2025 7:48 pm

professionelle wett-tipps heute

my web blog: wettbüro osnabrück
https://Karaindustry.com on October 10, 2025 7:50 am

wettanbieter ohne limit

Feel free to surf to my web page: spanien – deutschland wettquoten (https://Karaindustry.com)
sportwetten sicher tippen on October 10, 2025 8:16 am

sportwetten sicher tippen gratis ohne einzahlung

What's Hot

European Commission Outlines New Strategies for AI and Science – Fintech Schweiz Digital Finance News

Operator Bell begins Cohere AI rollout

Lucio, Lightbringer, Harvey, Jus Mundi, SpotDraft, LI UK + NY – Artificial Lawyer

A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

9 Comments

Frieze to Launch Abu Dhabi Fair in November 2026

Jeff Koons Returns to Gagosian with First New York Show in Seven Years

$45 M. Basquait Painting to Headline Sotheby’s Fall Sales in New York

Guggenheim’s 2026 Shows Include Carol Bove Survey, Taryn Simon Project

European Commission Outlines New Strategies for AI and Science – Fintech Schweiz Digital Finance News

Operator Bell begins Cohere AI rollout

Lucio, Lightbringer, Harvey, Jus Mundi, SpotDraft, LI UK + NY – Artificial Lawyer

What's Hot

A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Submission history

Related Posts

9 Comments

Subscribe to Updates