Diffusion models approximate the denoising distribution as a Gaussian and
predict its mean, whereas flow matching models reparameterize the Gaussian mean
as flow velocity. However, they underperform in few-step sampling due to
discretization error and tend to produce over-saturated colors under
classifier-free guidance (CFG). To address these limitations, we propose a
novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the
mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a
multi-modal flow velocity distribution, which can be learned with a KL
divergence loss. We demonstrate that GMFlow generalizes previous diffusion and
flow matching models where a single Gaussian is learned with an L_2 denoising
loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic
denoising distributions and velocity fields for precise few-step sampling.
Furthermore, we introduce a novel probabilistic guidance scheme that mitigates
the over-saturation issues of CFG and improves image generation quality.
Extensive experiments demonstrate that GMFlow consistently outperforms flow
matching baselines in generation quality, achieving a Precision of 0.942 with
only 6 sampling steps on ImageNet 256times256.