Efficient Multi-modal Large Language Models Via Progressive Consistency Distillation - Takara TLDR

Visual tokens consume substantial computational resources in multi-modal
large models (MLLMs), significantly compromising their efficiency. Recent works
have attempted to improve efficiency by compressing visual tokens during
training, either through modifications to model components or by introducing
additional parameters. However, they often overlook the increased learning
difficulty caused by such compression, as the model’s parameter space struggles
to quickly adapt to the substantial perturbations in the feature space induced
by token compression. In this work, we propose to develop Efficient MLLMs via
Progressive Consistency Distillation (EPIC), a progressive learning framework.
Specifically, by decomposing the feature space perturbations introduced by
token compression along the token-wise and layer-wise dimensions, we introduce
token consistency distillation and layer consistency distillation,
respectively, aiming to reduce the training difficulty by leveraging guidance
from a teacher model and following a progressive learning trajectory. Extensive
experiments demonstrate the superior effectiveness, robustness, and
generalization capabilities of our proposed framework.

Source link

What's Hot

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models – Takara TLDR

Google DeepMind unveils AI agent that automatically patches software vulnerabilities

Anthropic’s AI Models Join IBM—What It Means for Shareholders

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation – Takara TLDR

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models – Takara TLDR

Watch and Learn: Learning to Use Computers from Online Videos – Takara TLDR

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights – Takara TLDR

Basquiat Work on Paper Headline’s Phillips’ Frieze Week Sales

Charges Against Isaac Wright ‘to Be Dropped’ After His Arrest by NYPD

What the Los Angeles Wildfires Taught the Art Insurance Industry

Musée d’Orsay Puts Manet on (Mock) Trial for Obscenity

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models – Takara TLDR

Google DeepMind unveils AI agent that automatically patches software vulnerabilities

Anthropic’s AI Models Join IBM—What It Means for Shareholders

What's Hot

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation – Takara TLDR

Related Posts

Subscribe to Updates