Paper Page - Are Vision-Language Models Safe In The Wild? A Meme-Based Benchmark Study

VLMs are more vulnerable to harmful meme-based prompts than to synthetic images, and while multi-turn interactions offer some protection, significant vulnerabilities remain.

Rapid deployment of vision-language models (VLMs) magnifies safety risks, yet
most evaluations rely on artificial images. This study asks: How safe are
current VLMs when confronted with meme images that ordinary users share? To
investigate this question, we introduce MemeSafetyBench, a 50,430-instance
benchmark pairing real meme images with both harmful and benign instructions.
Using a comprehensive safety taxonomy and LLM-based instruction generation, we
assess multiple VLMs across single and multi-turn interactions. We investigate
how real-world memes influence harmful outputs, the mitigating effects of
conversational context, and the relationship between model scale and safety
metrics. Our findings demonstrate that VLMs show greater vulnerability to
meme-based harmful prompts than to synthetic or typographic images. Memes
significantly increase harmful responses and decrease refusals compared to
text-only inputs. Though multi-turn interactions provide partial mitigation,
elevated vulnerability persists. These results highlight the need for
ecologically valid evaluations and stronger safety mechanisms.

Source link

What's Hot

AI Agents + What’s Next for Legal Judgment – Artificial Lawyer

P3-SAM: Native 3D Part Segmentation – Takara TLDR

Stability AI Launches Stable Audio 2.5 with Enterprise-Grade Speed and Creative Control

Paper page – Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

P3-SAM: Native 3D Part Segmentation – Takara TLDR

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants – Takara TLDR

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning – Takara TLDR

Christie’s Will Auction The First Calculating Machine In History

The Art Market Isn’t Dying. The Way We Write About It Might Be.

Banksy Mural of Judge Beating Protestor Removed by Courts Service

Death of Matthew Christopher Pietras Ruled a Suicide

AI Agents + What’s Next for Legal Judgment – Artificial Lawyer

P3-SAM: Native 3D Part Segmentation – Takara TLDR

Stability AI Launches Stable Audio 2.5 with Enterprise-Grade Speed and Creative Control

What's Hot

Paper page – Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study

Related Posts

Subscribe to Updates