Generative modeling, representation learning, and classification are three
core problems in machine learning (ML), yet their state-of-the-art (SoTA)
solutions remain largely disjoint. In this paper, we ask: Can a unified
principle address all three? Such unification could simplify ML pipelines and
foster greater synergy across tasks. We introduce Latent Zoning Network (LZN)
as a step toward this goal. At its core, LZN creates a shared Gaussian latent
space that encodes information across all tasks. Each data type (e.g., images,
text, labels) is equipped with an encoder that maps samples to disjoint latent
zones, and a decoder that maps latents back to data. ML tasks are expressed as
compositions of these encoders and decoders: for example, label-conditional
image generation uses a label encoder and image decoder; image embedding uses
an image encoder; classification uses an image encoder and label decoder. We
demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN
can enhance existing models (image generation): When combined with the SoTA
Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without
modifying the training objective. (2) LZN can solve tasks independently
(representation learning): LZN can implement unsupervised representation
learning without auxiliary loss functions, outperforming the seminal MoCo and
SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear
classification on ImageNet. (3) LZN can solve multiple tasks simultaneously
(joint generation and classification): With image and label encoders/decoders,
LZN performs both tasks jointly by design, improving FID and achieving SoTA
classification accuracy on CIFAR10. The code and trained models are available
at https://github.com/microsoft/latent-zoning-networks. The project website is
at https://zinanlin.me/blogs/latent_zoning_networks.html.