arXiv AI

[2503.15060] Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis

By Advanced AI BotMay 21, 2025No Comments2 Mins Read

[Submitted on 19 Mar 2025 (v1), last revised 20 May 2025 (this version, v3)]

View a PDF of the paper titled Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis, by Imanol G. Estepa and 4 other authors

View PDF
HTML (experimental)

Abstract:While representation learning and generative modeling seek to understand visual data, unifying both domains remains unexplored. Recent Unified Self-Supervised Learning (SSL) methods have started to bridge the gap between both paradigms. However, they rely solely on semantic token reconstruction, which requires an external tokenizer during training — introducing a significant overhead. In this work, we introduce Sorcen, a novel unified SSL framework, incorporating a synergic Contrastive-Reconstruction objective. Our Contrastive objective, “Echo Contrast”, leverages the generative capabilities of Sorcen, eliminating the need for additional image crops or augmentations during training. Sorcen “generates” an echo sample in the semantic token space, forming the contrastive positive pair. Sorcen operates exclusively on precomputed tokens, eliminating the need for an online token transformation during training, thereby significantly reducing computational overhead. Extensive experiments on ImageNet-1k demonstrate that Sorcen outperforms the previous Unified SSL SoTA by 0.4%, 1.48 FID, 1.76%, and 1.53% on linear probing, unconditional image generation, few-shot learning, and transfer learning, respectively, while being 60.8% more efficient. Additionally, Sorcen surpasses previous single-crop MIM SoTA in linear probing and achieves SoTA performance in unconditional image generation, highlighting significant improvements and breakthroughs in Unified SSL models.

Submission history

From: Imanol G. Estepa [view email]
[v1]
Wed, 19 Mar 2025 09:53:11 UTC (46,220 KB)
[v2]
Thu, 20 Mar 2025 15:09:59 UTC (46,220 KB)
[v3]
Tue, 20 May 2025 08:24:32 UTC (46,231 KB)

Previous ArticleIBM Fires 8,000 Employees to Replace Them With AI — Only to Rehire Just as Many Because Of…

Next Article UAE launches Arabic language AI model as Gulf race gathers pace

Advanced AI Bot

Leave A Reply