Paper page - Streamline Without Sacrifice

Large multimodal models excel in multimodal tasks but face significant
computational challenges due to excessive computation on visual tokens. Unlike
token reduction methods that focus on token-level redundancy, we identify and
study the computation-level redundancy on vision tokens to ensure no
information loss. Our key insight is that vision tokens from the pretrained
vision encoder do not necessarily require all the heavy operations (e.g.,
self-attention, FFNs) in decoder-only LMMs and could be processed more lightly
with proper designs. We designed a series of experiments to discover and
progressively squeeze out the vision-related computation redundancy. Based on
our findings, we propose ProxyV, a novel approach that utilizes proxy vision
tokens to alleviate the computational burden on original vision tokens. ProxyV
enhances efficiency without compromising performance and can even yield notable
performance gains in scenarios with more moderate efficiency improvements.
Furthermore, the flexibility of ProxyV is demonstrated through its combination
with token reduction methods to boost efficiency further. The code will be made
public at this https://github.com/penghao-wu/ProxyV URL.

Source link

What's Hot

Mistral’s Le Chat adds deep research agent and voice mode to challenge OpenAI’s enterprise dominance

Anthropic tightens usage limits for Claude Code – without telling users

Perplexity AI Valuation Soars to $18 Billion Amid Aggressive Expansion

Paper page – Streamline Without Sacrifice

Paper page – PhysX: Physical-Grounded 3D Asset Generation

Paper page – DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering

Paper page – MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

Yale Art Gallery Rejects Federal Grants for Africa Migration Show

Chanel Will Return to New York City with Métiers d’Art Collection

Rashid Johnson Painting Spotted in Trump Official’s Home

Christie’s Reports $2.1 B. Sales Total for H1 2024

Mistral’s Le Chat adds deep research agent and voice mode to challenge OpenAI’s enterprise dominance

Anthropic tightens usage limits for Claude Code – without telling users

Perplexity AI Valuation Soars to $18 Billion Amid Aggressive Expansion

What's Hot

Paper page – Streamline Without Sacrifice

Related Posts

Subscribe to Updates