Paper page - Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

We introduce Lumina-Image 2.0, an advanced text-to-image generation framework
that achieves significant progress compared to previous work, Lumina-Next.
Lumina-Image 2.0 is built upon two key principles: (1) Unification – it adopts
a unified architecture (Unified Next-DiT) that treats text and image tokens as
a joint sequence, enabling natural cross-modal interactions and allowing
seamless task expansion. Besides, since high-quality captioners can provide
semantically well-aligned text-image training pairs, we introduce a unified
captioning system, Unified Captioner (UniCap), specifically designed for T2I
generation tasks. UniCap excels at generating comprehensive and accurate
captions, accelerating convergence and enhancing prompt adherence. (2)
Efficiency – to improve the efficiency of our proposed model, we develop
multi-stage progressive training strategies and introduce inference
acceleration techniques without compromising image quality. Extensive
evaluations on academic benchmarks and public text-to-image arenas show that
Lumina-Image 2.0 delivers strong performances even with only 2.6B parameters,
highlighting its scalability and design efficiency. We have released our
training details, code, and models at
https://github.com/Alpha-VLLM/Lumina-Image-2.0.

Source link

What's Hot

Paper page – HOComp: Interaction-Aware Human-Object Composition

DeepSeek Predicts DOGE, BONK And WIF Prices For End Of 2025

Jensen on tour, Elon on the hunt

Paper page – Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Paper page – HOComp: Interaction-Aware Human-Object Composition

Paper page – Does More Inference-Time Compute Really Help Robustness?

Paper page – RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Barnes Foundation Online Learning Platform Expands to Penn Museum

Archaeologists Identify 5,500-Year-Old Megalithic Tombs in Poland

Phillips to Debut ‘First-of-its Kind’ Priority Bidding Structure

3,800-Year-Old Warrior’s Tomb Unearthed in Azerbaijan

Paper page – HOComp: Interaction-Aware Human-Object Composition

DeepSeek Predicts DOGE, BONK And WIF Prices For End Of 2025

Jensen on tour, Elon on the hunt

What's Hot

Paper page – Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

Related Posts

Subscribe to Updates