Paper Page - Images Are Worth Variable Length Of Representations

Most existing vision encoders map images into a fixed-length sequence of tokens, overlooking the fact that different images contain varying amounts of information. For example, a visually complex image (e.g., a cluttered room) inherently carries more information and thus deserves more tokens than a simple image (e.g., a blank wall). To address this inefficiency, we propose DOVE, a dynamic vision encoder that produces a variable number of visual tokens (i.e., continuous representation vectors) to reconstruct each image. Our results show that DOVE significantly reduces the average number of tokens while maintaining high reconstruction quality. In several linear probing and downstream multimodal tasks, it outperforms existing autoencoder-based tokenization methods when using far fewer tokens, capturing more expressive semantic features compared to fixed-length encoding. We further extend DOVE with query-conditioned tokenization. By guiding the model to focus on query-relevant regions, it achieves more efficient and targeted semantic extraction. Our code and checkpoints are available at https://dove-encoder. github.io/dove-encoder.

Source link

What's Hot

Skai uses Amazon Bedrock Agents to significantly improve customer insights by revolutionized data access and analysis

ASML becomes top shareholder of French OpenAI competitor Mistral

Report: OpenAI will launch its own AI chip next year

Paper page – Images are Worth Variable Length of Representations

Why Language Models Hallucinate – Takara TLDR

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning – Takara TLDR

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation – Takara TLDR

Storied Collector and MoMA Trustee Dies at 92

New Banksy Work at London’s Royal Courts Immediately Covered Up

John Pritzker Donates 188 Dada and Surrealist Works to the Met Museum

British Museum Says Bayeux Tapestry Is Safe—and More Art News

Skai uses Amazon Bedrock Agents to significantly improve customer insights by revolutionized data access and analysis

ASML becomes top shareholder of French OpenAI competitor Mistral

Report: OpenAI will launch its own AI chip next year

What's Hot

Paper page – Images are Worth Variable Length of Representations

Related Posts

Subscribe to Updates