DragFlow: Unleashing DiT Priors With Region Based Supervision For Drag Editing - Takara TLDR

Drag-based image editing has long suffered from distortions in the target
region, largely because the priors of earlier base models, Stable Diffusion,
are insufficient to project optimized latents back onto the natural image
manifold. With the shift from UNet-based DDPMs to more scalable DiT with flow
matching (e.g., SD3.5, FLUX), generative priors have become significantly
stronger, enabling advances across diverse editing tasks. However, drag-based
editing has yet to benefit from these stronger priors. This work proposes the
first framework to effectively harness FLUX’s rich prior for drag-based
editing, dubbed DragFlow, achieving substantial gains over baselines. We first
show that directly applying point-based drag editing to DiTs performs poorly:
unlike the highly compressed features of UNets, DiT features are insufficiently
structured to provide reliable guidance for point-wise motion supervision. To
overcome this limitation, DragFlow introduces a region-based editing paradigm,
where affine transformations enable richer and more consistent feature
supervision. Additionally, we integrate pretrained open-domain personalization
adapters (e.g., IP-Adapter) to enhance subject consistency, while preserving
background fidelity through gradient mask-based hard constraints. Multimodal
large language models (MLLMs) are further employed to resolve task ambiguities.
For evaluation, we curate a novel Region-based Dragging benchmark (ReD Bench)
featuring region-level dragging instructions. Extensive experiments on
DragBench-DR and ReD Bench show that DragFlow surpasses both point-based and
region-based baselines, setting a new state-of-the-art in drag-based image
editing. Code and datasets will be publicly available upon publication.

Source link

What's Hot

The Unreasonable Effectiveness of Scaling Agents for Computer Use – Takara TLDR

Google’s Gemini AI app could soon be getting a big makeover

Lost Money on C3.ai, Inc. (AI)? Contact Levi & Korsinsky to Join Class Action Before October 21, 2025

DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing – Takara TLDR

The Unreasonable Effectiveness of Scaling Agents for Computer Use – Takara TLDR

Transformers Discover Molecular Structure Without Graph Priors – Takara TLDR

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems – Takara TLDR

New Archaeological Research Reveals Life in Pompeii Post-Eruption

Director Fired After Declining to Give Trump Sword for King Charles

Statue of Trump and Epstein Holding Hands Returns to Washington, D.C.

Glenn Lowry Sets His Sights on the Middle East After Departing MoMA

The Unreasonable Effectiveness of Scaling Agents for Computer Use – Takara TLDR

Google’s Gemini AI app could soon be getting a big makeover

Lost Money on C3.ai, Inc. (AI)? Contact Levi & Korsinsky to Join Class Action Before October 21, 2025

What's Hot

DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing – Takara TLDR

Related Posts

Subscribe to Updates