OneReward: Unified Mask-Guided Image Generation Via Multi-Task Human Preference Learning - Takara TLDR

In this paper, we introduce OneReward, a unified reinforcement learning
framework that enhances the model’s generative capabilities across multiple
tasks under different evaluation criteria using only \textit{One Reward} model.
By employing a single vision-language model (VLM) as the generative reward
model, which can distinguish the winner and loser for a given task and a given
evaluation criterion, it can be effectively applied to multi-task generation
models, particularly in contexts with varied data and diverse task objectives.
We utilize OneReward for mask-guided image generation, which can be further
divided into several sub-tasks such as image fill, image extend, object
removal, and text rendering, involving a binary mask as the edit area. Although
these domain-specific tasks share same conditioning paradigm, they differ
significantly in underlying data distributions and evaluation metrics. Existing
methods often rely on task-specific supervised fine-tuning (SFT), which limits
generalization and training efficiency. Building on OneReward, we develop
Seedream 3.0 Fill, a mask-guided generation model trained via multi-task
reinforcement learning directly on a pre-trained base model, eliminating the
need for task-specific SFT. Experimental results demonstrate that our unified
edit model consistently outperforms both commercial and open-source
competitors, such as Ideogram, Adobe Photoshop, and FLUX Fill [Pro], across
multiple evaluation dimensions. Code and model are available at:
https://one-reward.github.io

Source link

What's Hot

rStar2-Agent: Agentic Reasoning Technical Report – Takara TLDR

How OpenAI is reworking ChatGPT after landmark wrongful death lawsuit

Taco Bell is having second thoughts about relying on AI at the drive-through

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning – Takara TLDR

rStar2-Agent: Agentic Reasoning Technical Report – Takara TLDR

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning – Takara TLDR

Turning the Spell Around: Lightweight Alignment Amplification via Rank-One Safety Injection – Takara TLDR

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

Australian School Faces Pushback over AI Art Course—and More Art News

rStar2-Agent: Agentic Reasoning Technical Report – Takara TLDR

How OpenAI is reworking ChatGPT after landmark wrongful death lawsuit

Taco Bell is having second thoughts about relying on AI at the drive-through

What's Hot

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning – Takara TLDR

Related Posts

Subscribe to Updates