Paper Page - DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation Via Next-Detail Prediction

DetailFlow, a coarse-to-fine 1D autoregressive image generation method, improves quality and efficiency by using a novel next-detail prediction strategy, fewer tokens, and a parallel inference mechanism.

This paper presents DetailFlow, a coarse-to-fine 1D autoregressive (AR) image
generation method that models images through a novel next-detail prediction
strategy. By learning a resolution-aware token sequence supervised with
progressively degraded images, DetailFlow enables the generation process to
start from the global structure and incrementally refine details. This
coarse-to-fine 1D token sequence aligns well with the autoregressive inference
mechanism, providing a more natural and efficient way for the AR model to
generate complex visual content. Our compact 1D AR model achieves high-quality
image synthesis with significantly fewer tokens than previous approaches, i.e.
VAR/VQGAN. We further propose a parallel inference mechanism with
self-correction that accelerates generation speed by approximately 8x while
reducing accumulation sampling error inherent in teacher-forcing supervision.
On the ImageNet 256×256 benchmark, our method achieves 2.96 gFID with 128
tokens, outperforming VAR (3.3 FID) and FlexVAR (3.05 FID), which both require
680 tokens in their AR models. Moreover, due to the significantly reduced token
count and parallel inference mechanism, our method runs nearly 2x faster
inference speed compared to VAR and FlexVAR. Extensive experimental results
demonstrate DetailFlow’s superior generation quality and efficiency compared to
existing state-of-the-art methods.

Source link

What's Hot

ByteDance Volcano Engine Launches Command Line AI Agent veCLI, Terminal Access to Doubao Large Model_the

Are you AI ready? Growing skills gap demands a reality check

2D Gaussian Splatting with Semantic Alignment for Image Inpainting – Takara TLDR

Paper page – DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

2D Gaussian Splatting with Semantic Alignment for Image Inpainting – Takara TLDR

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward – Takara TLDR

OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning – Takara TLDR

Long-Lost Painting By Rubens From 1613 Discovered in Paris Mansion

Ken Griffin Loves Pollock’s Blue Poles So Much He Tried to Buy it

Nan Goldin Says Her Market ‘Tanked’ Due to Palestine Activism

Sally Mann Says Her Black Men Photos Are ‘Problematic’ in Hindsight

ByteDance Volcano Engine Launches Command Line AI Agent veCLI, Terminal Access to Doubao Large Model_the

Are you AI ready? Growing skills gap demands a reality check

2D Gaussian Splatting with Semantic Alignment for Image Inpainting – Takara TLDR

What's Hot

Paper page – DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction

Related Posts

Subscribe to Updates