Paper page - Vision Language Models are Biased

Vision language models exhibit strong biases in counting and identification tasks, demonstrating a failure mode that persist even with additional instructions or context.

Large language models (LLMs) memorize a vast amount of prior knowledge from
the Internet that help them on downstream tasks but also may notoriously sway
their outputs towards wrong or biased answers. In this work, we test how the
knowledge about popular subjects hurt the accuracy of vision language models
(VLMs) on standard, objective visual tasks of counting and identification. We
find that state-of-the-art VLMs are strongly biased (e.g, unable to recognize a
fourth stripe has been added to a 3-stripe Adidas logo) scoring an average of
17.05% accuracy in counting (e.g., counting stripes in an Adidas-like logo)
across 7 diverse domains from animals, logos, chess, board games, optical
illusions, to patterned grids. Insert text (e.g., “Adidas”) describing the
subject name into the counterfactual image further decreases VLM accuracy. The
biases in VLMs are so strong that instructing them to double-check their
results or rely exclusively on image details to answer improves counting
accuracy by only +2 points, on average. Our work presents an interesting
failure mode in VLMs and an automated framework for testing VLM biases. Code
and data are available at: vlmsarebiased.github.io.

Source link

What's Hot

IBM falls most in 15 months on tepid software sales

Google’s new Web Guide search experiment organizes results with AI

a16z says OpenSesame to Canadian agentic AI startup for its speedrun accelerator

Paper page – Vision Language Models are Biased

Paper page – Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Paper page – Elevating 3D Models: High-Quality Texture and Geometry Refinement from a Low-Quality Model

Paper page – Pixels, Patterns, but No Poetry: To See The World like Humans

US Appeals Court Overturns $8.8 M. Trademark Judgement For Yuga Labs

Old Masters ‘Making a Comeback’ in London: Morning Links

Bill Proposed To Apply Anti-Money Laundering Regulations to Art Market

France’s Culture Minister to Go on Trial for Corruption

IBM falls most in 15 months on tepid software sales

Google’s new Web Guide search experiment organizes results with AI

a16z says OpenSesame to Canadian agentic AI startup for its speedrun accelerator

What's Hot

Paper page – Vision Language Models are Biased

Related Posts

Subscribe to Updates