UNCAGE: Contrastive Attention Guidance For Masked Generative Transformers In Text-to-Image Generation - Takara TLDR

Text-to-image (T2I) generation has been actively studied using Diffusion
Models and Autoregressive Models. Recently, Masked Generative Transformers have
gained attention as an alternative to Autoregressive Models to overcome the
inherent limitations of causal attention and autoregressive decoding through
bidirectional attention and parallel decoding, enabling efficient and
high-quality image generation. However, compositional T2I generation remains
challenging, as even state-of-the-art Diffusion Models often fail to accurately
bind attributes and achieve proper text-image alignment. While Diffusion Models
have been extensively studied for this issue, Masked Generative Transformers
exhibit similar limitations but have not been explored in this context. To
address this, we propose Unmasking with Contrastive Attention Guidance
(UNCAGE), a novel training-free method that improves compositional fidelity by
leveraging attention maps to prioritize the unmasking of tokens that clearly
represent individual objects. UNCAGE consistently improves performance in both
quantitative and qualitative evaluations across multiple benchmarks and
metrics, with negligible inference overhead. Our code is available at
https://github.com/furiosa-ai/uncage.

Source link

What's Hot

The Future For In-house Legal – Artificial Lawyer

ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces – Takara TLDR

DeepSeek unveils updated model in latest advancement towards AI agents

UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation – Takara TLDR

ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces – Takara TLDR

MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction – Takara TLDR

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System – Takara TLDR

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

St. Patrick’s Cathedral Unveils Monumental Mural by Adam Cvijanovic

Three Loaned Banksy Works Incite Dispute Between England and Italy

The Future For In-house Legal – Artificial Lawyer

ByteWrist: A Parallel Robotic Wrist Enabling Flexible and Anthropomorphic Motion for Confined Spaces – Takara TLDR

DeepSeek unveils updated model in latest advancement towards AI agents

What's Hot

UNCAGE: Contrastive Attention Guidance for Masked Generative Transformers in Text-to-Image Generation – Takara TLDR

Related Posts

Subscribe to Updates