Paper Page - Beyond Simple Edits: X-Planner For Complex Instruction-Based Image Editing

X-Planner, a planning system utilizing a multimodal large language model, decomposes complex text-guided image editing instructions into precise sub-instructions, ensuring localized, identity-preserving edits and achieving top performance on established benchmarks.

Recent diffusion-based image editing methods have significantly advanced
text-guided tasks but often struggle to interpret complex, indirect
instructions. Moreover, current models frequently suffer from poor identity
preservation, unintended edits, or rely heavily on manual masks. To address
these challenges, we introduce X-Planner, a Multimodal Large Language Model
(MLLM)-based planning system that effectively bridges user intent with editing
model capabilities. X-Planner employs chain-of-thought reasoning to
systematically decompose complex instructions into simpler, clear
sub-instructions. For each sub-instruction, X-Planner automatically generates
precise edit types and segmentation masks, eliminating manual intervention and
ensuring localized, identity-preserving edits. Additionally, we propose a novel
automated pipeline for generating large-scale data to train X-Planner which
achieves state-of-the-art results on both existing benchmarks and our newly
introduced complex editing benchmark.

Source link

What's Hot

Cohere’s Nick Frosst Rejects AGI Hype, Prioritizes Enterprise A.I.

Perplexity Predicts XRP, WLFI and Dogecoin Prices by 2025

C3.ai (NYSE:AI) Misses Q2 Revenue Estimates, Stock Drops 12.6%

Paper page – Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views – Takara TLDR

Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing – Takara TLDR

MedDINOv3: How to adapt vision foundation models for medical image segmentation? – Takara TLDR

Nazi-Looted Painting from Argentine Home May Have Been Recovered

Moche Residence Unearthed at Archaeological Site in Northern Peru

Armory Show to ‘Complicate Stereotypes,’ and More Art News

Search for Nazi-Looted Art Leads to House Arrest Order in Argentina

Cohere’s Nick Frosst Rejects AGI Hype, Prioritizes Enterprise A.I.

Perplexity Predicts XRP, WLFI and Dogecoin Prices by 2025

C3.ai (NYSE:AI) Misses Q2 Revenue Estimates, Stock Drops 12.6%

What's Hot

Paper page – Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

Related Posts

Subscribe to Updates