Paper Page - ColorBench: Can VLMs See And Understand The Colorful World? A Comprehensive Benchmark For Color Perception, Reasoning, And Robustness

Color plays an important role in human perception and usually provides
critical clues in visual reasoning. However, it is unclear whether and how
vision-language models (VLMs) can perceive, understand, and leverage color as
humans. This paper introduces ColorBench, an innovative benchmark meticulously
crafted to assess the capabilities of VLMs in color understanding, including
color perception, reasoning, and robustness. By curating a suite of diverse
test scenarios, with grounding in real applications, ColorBench evaluates how
these models perceive colors, infer meanings from color-based cues, and
maintain consistent performance under varying color transformations. Through an
extensive evaluation of 32 VLMs with varying language models and vision
encoders, our paper reveals some undiscovered findings: (i) The scaling law
(larger models are better) still holds on ColorBench, while the language model
plays a more important role than the vision encoder. (ii) However, the
performance gaps across models are relatively small, indicating that color
understanding has been largely neglected by existing VLMs. (iii) CoT reasoning
improves color understanding accuracies and robustness, though they are
vision-centric tasks. (iv) Color clues are indeed leveraged by VLMs on
ColorBench but they can also mislead models in some tasks. These findings
highlight the critical limitations of current VLMs and underscore the need to
enhance color comprehension. Our ColorBenchcan serve as a foundational tool for
advancing the study of human-level color understanding of multimodal AI.

Source link

What's Hot

After India, OpenAI launches its affordable ChatGPT Go plan in Indonesia

What Actually Works in 2025?

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System – Takara TLDR

Paper page – ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System – Takara TLDR

SPATIALGEN: Layout-guided 3D Indoor Scene Generation – Takara TLDR

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs – Takara TLDR

St. Patrick’s Cathedral Unveils Monumental Mural by Adam Cvijanovic

Three Loaned Banksy Works Incite Dispute Between England and Italy

Major Collection of Old Masters Paintings Could Be Fractionalized

100 Must-See Artworks at the Metropolitan Museum of Art

After India, OpenAI launches its affordable ChatGPT Go plan in Indonesia

What Actually Works in 2025?

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System – Takara TLDR

What's Hot

Paper page – ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Related Posts

Subscribe to Updates