Paper page - Lizard: An Efficient Linearization Framework for Large Language Models

Lizard is a linearization framework that transforms Transformer-based LLMs into subquadratic architectures for efficient infinite-context generation, using a hybrid attention mechanism and hardware-aware training.

We propose Lizard, a linearization framework that transforms pretrained
Transformer-based Large Language Models (LLMs) into flexible, subquadratic
architectures for infinite-context generation. Transformer-based LLMs face
significant memory and computational bottlenecks as context lengths increase,
due to the quadratic complexity of softmax attention and the growing key-value
(KV) cache. Lizard addresses these limitations by introducing a subquadratic
attention mechanism that closely approximates softmax attention while
preserving the output quality. Unlike previous linearization methods, which are
often limited by fixed model structures and therefore exclude gating
mechanisms, Lizard incorporates a gating module inspired by recent
state-of-the-art linear models. This enables adaptive memory control, supports
constant-memory inference, offers strong length generalization, and allows more
flexible model design. Lizard combines gated linear attention for global
context compression with sliding window attention enhanced by meta memory,
forming a hybrid mechanism that captures both long-range dependencies and
fine-grained local interactions. Moreover, we introduce a hardware-aware
algorithm that accelerates the training speed of our models. Extensive
experiments show that Lizard achieves near-lossless recovery of the teacher
model’s performance across standard language modeling tasks, while
significantly outperforming previous linearization methods. On the 5-shot MMLU
benchmark, Lizard improves over prior models by 18 points and shows significant
improvements on associative recall tasks.

Source link

What's Hot

Qlik’s ‘Do it Different’ data management approach in detail

Salon Software Platform Boulevard Nearly Doubles Valuation To $800M With $80M Series D

Slack gets smarter: New AI tools summarize chats, explain jargon, and automate work

Paper page – Lizard: An Efficient Linearization Framework for Large Language Models

Paper page – MOSPA: Human Motion Generation Driven by Spatial Audio

Paper page – SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Paper page – AnyI2V: Animating Any Conditional Image with Motion Control

Rashid Johnson Painting Spotted in Trump Official’s Home

Christie’s Reports $2.1 B. Sales Total for H1 2024

Morning Links for July 16, 2025

Advisers Barbara Guggenheim and Abigail Asher Sue Each Other

Qlik’s ‘Do it Different’ data management approach in detail

Salon Software Platform Boulevard Nearly Doubles Valuation To $800M With $80M Series D

Slack gets smarter: New AI tools summarize chats, explain jargon, and automate work

What's Hot

Paper page – Lizard: An Efficient Linearization Framework for Large Language Models

Related Posts

Subscribe to Updates