Q-Sched: Pushing The Boundaries Of Few-Step Diffusion Models With Quantization-Aware Scheduling - Takara TLDR

Text-to-image diffusion models are computationally intensive, often requiring
dozens of forward passes through large transformer backbones. For instance,
Stable Diffusion XL generates high-quality images with 50 evaluations of a
2.6B-parameter model, an expensive process even for a single batch. Few-step
diffusion models reduce this cost to 2-8 denoising steps but still depend on
large, uncompressed U-Net or diffusion transformer backbones, which are often
too costly for full-precision inference without datacenter GPUs. These
requirements also limit existing post-training quantization methods that rely
on full-precision calibration. We introduce Q-Sched, a new paradigm for
post-training quantization that modifies the diffusion model scheduler rather
than model weights. By adjusting the few-step sampling trajectory, Q-Sched
achieves full-precision accuracy with a 4x reduction in model size. To learn
quantization-aware pre-conditioning coefficients, we propose the JAQ loss,
which combines text-image compatibility with an image quality metric for
fine-grained optimization. JAQ is reference-free and requires only a handful of
calibration prompts, avoiding full-precision inference during calibration.
Q-Sched delivers substantial gains: a 15.5% FID improvement over the FP16
4-step Latent Consistency Model and a 16.6% improvement over the FP16 8-step
Phased Consistency Model, showing that quantization and few-step distillation
are complementary for high-fidelity generation. A large-scale user study with
more than 80,000 annotations further confirms Q-Sched’s effectiveness on both
FLUX.1[schnell] and SDXL-Turbo.

Source link

What's Hot

New Benchmark for Domestic Image Creation! Volcano Engine Seedream 4.0 Released, Leading a New Trend in Multi-Image Creation with 4K Direct Output

ALSP Lawhive Buys Woodstock As SMB Market Evolves – Artificial Lawyer

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions – Takara TLDR

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling – Takara TLDR

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions – Takara TLDR

Does DINOv3 Set a New Medical Vision Standard? – Takara TLDR

Reinforcement Learning Foundations for Deep Research Systems: A Survey – Takara TLDR

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

New Benchmark for Domestic Image Creation! Volcano Engine Seedream 4.0 Released, Leading a New Trend in Multi-Image Creation with 4K Direct Output

ALSP Lawhive Buys Woodstock As SMB Market Evolves – Artificial Lawyer

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions – Takara TLDR

What's Hot

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling – Takara TLDR

Related Posts

Subscribe to Updates