Diffusion LLMs Can Do Faster-Than-AR Inference Via Discrete Diffusion Forcing - Takara TLDR

Diffusion Large Language Models (dLLMs) have emerged as a promising
alternative to autoregressive (AR) LLMs for text generation, with the potential
to decode multiple tokens in a single iteration. However, none of the existing
open-source dLLMs have achieved superior inference speed over AR LLMs of
similar size. This paper breaks this barrier based on a simple and effective
strategy named discrete diffusion forcing (D2F). D2F equips dLLMs with two key
capabilities: (1) block-wise autoregressive generation to enable KV cache
utilization; (2) prediction of following tokens without requiring completion of
prior blocks for inter-block parallel decoding. In this way, the vanilla dLLMs
are refurbished into an AR-diffusion hybrid paradigm for efficient inference.
D2F can be implemented with an asymmetric distillation process based on
pre-trained dLLMs. We further propose a pipelined parallel decoding algorithm,
which enables a trade-off between efficiency and efficacy. Empirically, D2F
dLLMs achieve more than $\mathbf{2.5\times}$ inference speed than LLaMA3 and
Qwen2.5 on GSM8K. Compared to vanilla dLLMs like LLaDA and Dream, the
acceleration can be more than $\mathbf{50\times}$ while maintaining comparable
output quality. The code is available at
https://github.com/zhijie-group/Discrete-Diffusion-Forcing.

Source link

What's Hot

Tencent Hunyuan Releases and Open Sources Image Model 2.1, Supporting Native 2K High-Quality Images_the_model_This

Task orders and bottlenecks: how the largest US shipbuilder is putting AI to work

Does DINOv3 Set a New Medical Vision Standard? – Takara TLDR

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing – Takara TLDR

Does DINOv3 Set a New Medical Vision Standard? – Takara TLDR

Reinforcement Learning Foundations for Deep Research Systems: A Survey – Takara TLDR

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? – Takara TLDR

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

Tencent Hunyuan Releases and Open Sources Image Model 2.1, Supporting Native 2K High-Quality Images_the_model_This

Task orders and bottlenecks: how the largest US shipbuilder is putting AI to work

Does DINOv3 Set a New Medical Vision Standard? – Takara TLDR

What's Hot

Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing – Takara TLDR

Related Posts

Subscribe to Updates