Where MLLMs Attend And What They Rely On: Explaining Autoregressive Token Generation - Takara TLDR

Multimodal large language models (MLLMs) have demonstrated remarkable
capabilities in aligning visual inputs with natural language outputs. Yet, the
extent to which generated tokens depend on visual modalities remains poorly
understood, limiting interpretability and reliability. In this work, we present
EAGLE, a lightweight black-box framework for explaining autoregressive token
generation in MLLMs. EAGLE attributes any selected tokens to compact perceptual
regions while quantifying the relative influence of language priors and
perceptual evidence. The framework introduces an objective function that
unifies sufficiency (insight score) and indispensability (necessity score),
optimized via greedy search over sparsified image regions for faithful and
efficient attribution. Beyond spatial attribution, EAGLE performs
modality-aware analysis that disentangles what tokens rely on, providing
fine-grained interpretability of model decisions. Extensive experiments across
open-source MLLMs show that EAGLE consistently outperforms existing methods in
faithfulness, localization, and hallucination diagnosis, while requiring
substantially less GPU memory. These results highlight its effectiveness and
practicality for advancing the interpretability of MLLMs. The code is available
at https://github.com/RuoyuChen10/EAGLE.

Source link

What's Hot

Stanford Students Use ChatGPT and Perplexity AI to Finish Assignments in 10 Minutes – Research Snipers

AI Deep Research + Doc Analyzer – Artificial Lawyer

VGGT-X: When VGGT Meets Dense Novel View Synthesis – Takara TLDR

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation – Takara TLDR

VGGT-X: When VGGT Meets Dense Novel View Synthesis – Takara TLDR

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning – Takara TLDR

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning – Takara TLDR

Federal Judge Denies Motion to Dismiss by Kasseem ‘Swizz Beatz’ Dean in 1MBD Scandal Case

Picasso Museum in Paris Plans $59 M. Expansion with New Sculpture Park

MSN Warsaw Director Joanna Mytkowska on Museums in Times of Change

Nara Painting Heads to Christie’s London After Recent Sotheby’s Test

Stanford Students Use ChatGPT and Perplexity AI to Finish Assignments in 10 Minutes – Research Snipers

AI Deep Research + Doc Analyzer – Artificial Lawyer

VGGT-X: When VGGT Meets Dense Novel View Synthesis – Takara TLDR

What's Hot

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation – Takara TLDR

Related Posts

Subscribe to Updates