Paper Page - DKV-Cache: The Cache For Diffusion Language Models

A KV-cache-like mechanism, delayed KV-Cache, accelerates diffusion language models’ inference without significantly degrading performance.

Diffusion Language Models (DLMs) have been seen as a promising competitor for
autoregressive language models. However, diffusion language models have long
been constrained by slow inference. A core challenge is that their
non-autoregressive architecture and bidirectional attention preclude the
key-value cache that accelerates decoding. We address this bottleneck by
proposing a KV-cache-like mechanism, delayed KV-Cache, for the denoising
process of DLMs. Our approach is motivated by the observation that different
tokens have distinct representation dynamics throughout the diffusion process.
Accordingly, we propose a delayed and conditioned caching strategy for key and
value states. We design two complementary variants to cache key and value
step-by-step: (1) dKV-Cache-Decode, which provides almost lossless
acceleration, and even improves performance on long sequences, suggesting that
existing DLMs may under-utilise contextual information during inference. (2)
dKV-Cache-Greedy, which has aggressive caching with reduced lifespan, achieving
higher speed-ups with quadratic time complexity at the cost of some performance
degradation. dKV-Cache, in final, achieves from 2-10x speedup in inference,
largely narrowing the gap between ARs and DLMs. We evaluate our dKV-Cache on
several benchmarks, delivering acceleration across general language
understanding, mathematical, and code-generation benchmarks. Experiments
demonstrate that cache can also be used in DLMs, even in a training-free manner
from current DLMs.

Source link

What's Hot

Sophont, Founded by 22-Year-Old Innovator, Raises $9.22M Seed Round to Transform Healthcare with Multimodal AI

Elon Musk Jumps Into Controversy Over OpenAI Whistleblower

The world’s 50 most valuable private companies

Paper page – dKV-Cache: The Cache for Diffusion Language Models

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning – Takara TLDR

Hunyuan-MT Technical Report – Takara TLDR

3D and 4D World Modeling: A Survey – Takara TLDR

Sally Mann Says Her Black Men Photos Are ‘Problematic’ in Hindsight

National Gallery and Tate Have ‘Bad Blood’—and More Art News

Christie’s Will Auction The First Calculating Machine In History

The Art Market Isn’t Dying. The Way We Write About It Might Be.

Sophont, Founded by 22-Year-Old Innovator, Raises $9.22M Seed Round to Transform Healthcare with Multimodal AI

Elon Musk Jumps Into Controversy Over OpenAI Whistleblower

The world’s 50 most valuable private companies

What's Hot

Paper page – dKV-Cache: The Cache for Diffusion Language Models

Related Posts

Subscribe to Updates