Paper Page - Images Are Worth Variable Length Of Representations

Most existing vision encoders map images into a fixed-length sequence of tokens, overlooking the fact that different images contain varying amounts of information. For example, a visually complex image (e.g., a cluttered room) inherently carries more information and thus deserves more tokens than a simple image (e.g., a blank wall). To address this inefficiency, we propose DOVE, a dynamic vision encoder that produces a variable number of visual tokens (i.e., continuous representation vectors) to reconstruct each image. Our results show that DOVE significantly reduces the average number of tokens while maintaining high reconstruction quality. In several linear probing and downstream multimodal tasks, it outperforms existing autoencoder-based tokenization methods when using far fewer tokens, capturing more expressive semantic features compared to fixed-length encoding. We further extend DOVE with query-conditioned tokenization. By guiding the model to focus on query-relevant regions, it achieves more efficient and targeted semantic extraction. Our code and checkpoints are available at https://dove-encoder. github.io/dove-encoder.

Source link

What's Hot

IBM Power Processor, This One Goes to 11

MIT Takes The Crown In 2025-2026 BlueSky Composite Ranking

Legora Partners With Deloitte Legal For Inhouse Push – Artificial Lawyer

Paper page – Images are Worth Variable Length of Representations

Behavioral Fingerprinting of Large Language Models – Takara TLDR

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth – Takara TLDR

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding – Takara TLDR

Tony Shafrazi and the Art of the Comeback

Basquiats Linked to 1MDB Scandal Auctioned by US Government

US Ambassador to UK Fills Residence with Impressionist Masters

New Code of Ethics Implores UK Museums to End Fossil Fuel Sponsorships

IBM Power Processor, This One Goes to 11

MIT Takes The Crown In 2025-2026 BlueSky Composite Ranking

Legora Partners With Deloitte Legal For Inhouse Push – Artificial Lawyer

What's Hot

Paper page – Images are Worth Variable Length of Representations

Related Posts

Subscribe to Updates