View a PDF of the paper titled Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks, by Melanie Mitchell and 2 other authors
View PDF
HTML (experimental)
Abstract:We explore the abstract reasoning abilities of text-only and multimodal versions of GPT-4, using the ConceptARC benchmark [10], which is designed to evaluate robust understanding and reasoning with core-knowledge concepts. We extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed, one-shot prompting (rather than simple, zero-shot prompts) with text versions of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4, on zero- and one-shot prompts using image versions of the simplest tasks. Our experimental results support the conclusion that neither version of GPT-4 has developed robust abstraction abilities at humanlike levels.
Submission history
From: Melanie Mitchell [view email]
[v1]
Tue, 14 Nov 2023 04:33:49 UTC (1,549 KB)
[v2]
Sun, 26 Nov 2023 20:42:08 UTC (1,549 KB)
[v3]
Mon, 11 Dec 2023 23:57:17 UTC (1,549 KB)