Paper Page - Step1X-Edit: A Practical Framework For General Image Editing

In recent years, image editing models have witnessed remarkable and rapid
development. The recent unveiling of cutting-edge multimodal models such as
GPT-4o and Gemini2 Flash has introduced highly promising image editing
capabilities. These models demonstrate an impressive aptitude for fulfilling a
vast majority of user-driven editing requirements, marking a significant
advancement in the field of image manipulation. However, there is still a large
gap between the open-source algorithm with these closed-source models. Thus, in
this paper, we aim to release a state-of-the-art image editing model, called
Step1X-Edit, which can provide comparable performance against the closed-source
models like GPT-4o and Gemini2 Flash. More specifically, we adopt the
Multimodal LLM to process the reference image and the user’s editing
instruction. A latent embedding has been extracted and integrated with a
diffusion image decoder to obtain the target image. To train the model, we
build a data generation pipeline to produce a high-quality dataset. For
evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world
user instructions. Experimental results on GEdit-Bench demonstrate that
Step1X-Edit outperforms existing open-source baselines by a substantial margin
and approaches the performance of leading proprietary models, thereby making
significant contributions to the field of image editing.

Source link

What's Hot

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference – Takara TLDR

Why Google DeepMind CEO Demis Hassabis Can’t Sleep At Night

OpenAI launches education initiative in India ahead of AI Summit 2026 | Latest News India

Paper page – Step1X-Edit: A Practical Framework for General Image Editing

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference – Takara TLDR

Do What? Teaching Vision-Language-Action Models to Reject the Impossible – Takara TLDR

InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles – Takara TLDR

Dealers Living Like Collectors, Egypt’s Tourism and More: Morning Links

Mütter Museum in Philadelphia Announces New Policy for Human Remains

Inigo Philbrick, Art Dealer Convicted of Fraud, Appears in BBC Film

Links for August 22, 2025

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill \& Decode Inference – Takara TLDR

Why Google DeepMind CEO Demis Hassabis Can’t Sleep At Night

OpenAI launches education initiative in India ahead of AI Summit 2026 | Latest News India

What's Hot

Paper page – Step1X-Edit: A Practical Framework for General Image Editing

Related Posts

Subscribe to Updates