Paper page - Robust Multimodal Large Language Models Against Modality Conflict

Investigation of modality conflict in multimodal large language models reveals its role in causing hallucinations, with reinforcement learning emerging as the most effective mitigation strategy.

AI-generated summary

Despite the impressive capabilities of multimodal large language models
(MLLMs) in vision-language tasks, they are prone to hallucinations in
real-world scenarios. This paper investigates the hallucination phenomenon in
MLLMs from the perspective of modality conflict. Unlike existing works focusing
on the conflicts between model responses and inputs, we study the inherent
conflicts in inputs from different modalities that place MLLMs in a dilemma and
directly lead to hallucinations. We formally define the modality conflict and
construct a dataset named Multimodal Modality Conflict (MMMC) to simulate this
phenomenon in vision-language tasks. Three methods based on prompt engineering,
supervised fine-tuning, and reinforcement learning are proposed to alleviate
the hallucination caused by modality conflict. Extensive experiments are
conducted on the MMMC dataset to analyze the merits and demerits of these
methods. Our results show that the reinforcement learning method achieves the
best performance in mitigating the hallucination under modality conflict, while
the supervised fine-tuning method shows promising and stable performance. Our
work sheds light on the unnoticed modality conflict that leads to
hallucinations and provides more insights into the robustness of MLLMs.

Source link

What's Hot

AI Testing and Evaluation: Learnings from cybersecurity

Tesla updates Robotaxi app once again: Here’s what’s new

Rainmaker partners with Atmo to squeeze more rain from clouds

Paper page – Robust Multimodal Large Language Models Against Modality Conflict

Paper page – BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity

Paper page – From One to More: Contextual Part Latents for 3D Generation

Paper page – What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models

Homeland Security Targets Chicago’s National Museum of Puerto Rican Arts & Culture

1,600-Year-Old Tomb of Mayan City’s Founding King Discovered in Belize

Centre Pompidou Cancels Caribbean Art Show, Raising Controversy

‘Night at the Museum’ Reboot in the Works

AI Testing and Evaluation: Learnings from cybersecurity

Tesla updates Robotaxi app once again: Here’s what’s new

Rainmaker partners with Atmo to squeeze more rain from clouds

What's Hot

Paper page – Robust Multimodal Large Language Models Against Modality Conflict

Related Posts

Subscribe to Updates