Paper Page - Show-o2: Improved Native Unified Multimodal Models

Show-o2 leverages autoregressive modeling and flow matching within a 3D causal variational autoencoder to create unified visual representations for multimodal understanding and generation tasks.

This paper presents improved native unified multimodal models, i.e.,
Show-o2, that leverage autoregressive modeling and flow matching. Built upon a
3D causal variational autoencoder space, unified visual representations are
constructed through a dual-path of spatial (-temporal) fusion, enabling
scalability across image and video modalities while ensuring effective
multimodal understanding and generation. Based on a language model,
autoregressive modeling and flow matching are natively applied to the language
head and flow head, respectively, to facilitate text token prediction and
image/video generation. A two-stage training recipe is designed to effectively
learn and scale to larger models. The resulting Show-o2 models demonstrate
versatility in handling a wide range of multimodal understanding and generation
tasks across diverse modalities, including text, images, and videos. Code and
models are released at https://github.com/showlab/Show-o.

Source link

What's Hot

RSS co-creator launches new protocol for AI data licensing

Google Unveils New AI Marketing Tools Ahead of Holiday Season

Google Search AI Mode rolls out in five new languages, including Hindi and Japanese

Paper page – Show-o2: Improved Native Unified Multimodal Models

Reconstruction Alignment Improves Unified Multimodal Models – Takara TLDR

Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding – Takara TLDR

UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward – Takara TLDR

Growing Support for Parthenon Marbles’ Return to Greece, More Art News

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

RSS co-creator launches new protocol for AI data licensing

Google Unveils New AI Marketing Tools Ahead of Holiday Season

Google Search AI Mode rolls out in five new languages, including Hindi and Japanese

What's Hot

Paper page – Show-o2: Improved Native Unified Multimodal Models

Related Posts

Subscribe to Updates