Blockwise Parallel Decoding For Deep Autoregressive Models

Abstract:
Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make different trade-offs between the amount of computation needed per layer and the length of the critical path at training time, generation still remains an inherently sequential process. To overcome this limitation, we propose a novel blockwise parallel decoding scheme in which we make predictions for multiple time steps in parallel then back off to the longest prefix validated by a scoring model. This allows for substantial theoretical improvements in generation speed when applied to architectures that can process output sequences in parallel. We verify our approach empirically through a series of experiments using state-of-the-art self-attention models for machine translation and image super-resolution, achieving iteration reductions of up to 2x over a baseline greedy decoder with no loss in quality, or up to 7x in exchange for a slight decrease in performance. In terms of wall-clock time, our fastest models exhibit real-time speedups of up to 4x over standard greedy decoding.

Authors: Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

source

What's Hot

Tuosda Launches Its First Humanoid Robot! Industry Competes in the ‘Industrial Blue Ocean’_robot_’Xiao_as

Glancy Prongay & Murray LLP, a Leading Securities Fraud Law Firm Encourages C3.ai, Inc. (AI) Investors To Inquire About Securities Fraud Class Action

California lawmakers pass AI safety bill SB 53 — but Newsom could still veto

Blockwise Parallel Decoding for Deep Autoregressive Models

AGI is not coming!

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Ohio Auction of Two Paintings Looted By Nazis Halted By Foundation

Lee Ufan Painting at Center of Bribery Investigation in Korea

Drought Reveals 40 Ancient Tombs in Northern Iraqi Reservoir

Artifacts Removed from Gaza Building Before Suspected Israeli Strike

Tuosda Launches Its First Humanoid Robot! Industry Competes in the ‘Industrial Blue Ocean’_robot_’Xiao_as

Glancy Prongay & Murray LLP, a Leading Securities Fraud Law Firm Encourages C3.ai, Inc. (AI) Investors To Inquire About Securities Fraud Class Action

California lawmakers pass AI safety bill SB 53 — but Newsom could still veto

What's Hot

Blockwise Parallel Decoding for Deep Autoregressive Models

Related Posts

Subscribe to Updates