Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Adobe Photoshop is among the most recognizable pieces of software ever created, used by more than 90% of the world’s creative professionals, according to Photutorial.
So the fact that a new open source AI model — Qwen-Image Edit, released yesterday by Chinese e-commerce giant Alibaba’s Qwen Team of AI researchers — is now able to accomplish a huge number of Photoshop-like editing jobs with text inputs alone, is a notable achievement.
Built on the 20-billion-parameter Qwen-Image foundation model released earlier this month, Qwen-Image-Edit extends the system’s unique strengths in text rendering to cover a wide spectrum of editing tasks, from subtle appearance changes to broader semantic transformations.
Simply upload a starting image — I tried one of myself from VentureBeat’s last annual Transform conference in San Francisco — and then type instructions of what you want to change, and Qwen-Image-Edit will return a new image with those edits applied.
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO
Input image example:

Output image example with prompt: “Make the man wearing a tuxedo.”

The model is available now across several platforms, including Qwen Chat, Hugging Face, ModelScope, GitHub, and through the Alibaba Cloud application programming interface (API), the latter which allows any third-party developer or enterprise to integrate this new model into their own applications and workflows.
I created my examples above on Qwen Chat, the Qwen Team’s rival to OpenAI’s ChatGPT, however, it should be noted for any aspiring users that generations are limited to about 8 free jobs (input/outputs) per 12 hour period before it resets. Paying users can have access to more jobs.

With support for both English and Chinese inputs, and a dual focus on both semantic meaning and visual fidelity, Qwen-Image-Edit aims to lower barriers to professional-grade visual content creation.
And given that the model is available as an open source code under an Apache 2.0 license, it’s safe for enterprises to take, download and set up for free on their own hardware or virtual clouds/machines, potentially resulting in a huge cost savings from proprietary software like Photoshop.
As Junyang Lin, a Qwen Team researcher wrote on X, “it can remove a strand of hair, very delicate image modification.”
The team’s announcement echoes this sentiment, presenting Qwen-Image-Edit not as an entirely new system, but as a natural extension of Qwen-Image that applies its unique text rendering and dual-encoding approach directly to editing tasks.
Dual encodings allow for edits preserving style and content of original image
Qwen-Image-Edit builds on the foundation established by Qwen-Image, which was introduced earlier this year as a large-scale model specializing in both image generation and text rendering.
Qwen-Image’s technical report highlighted its ability to handle complex tasks like paragraph-level text rendering, Chinese and English characters, and multi-line layouts with accuracy.
The report also emphasized a dual-encoding mechanism, feeding images simultaneously into Qwen2.5-VL for semantic control and a variational autoencoder (VAE) for reconstructive detail. This approach allows edits that remain faithful to both the intent of the prompt and the look of the original image.
Those same architectural choices underpin Qwen-Image-Edit. By leveraging dual encodings, the model can adjust at two levels: semantic edits that change the meaning or structure of a scene, and appearance edits that introduce or remove elements while keeping the rest untouched.
Semantic editing includes creating new intellectual property, rotating objects 90 or 180 degrees to reveal different views, or transforming an input into another style such as Studio Ghibli-inspired art. These edits typically modify many pixels but preserve the underlying identity of objects.
Here’s an example of semantic editing from Shridhar Athinarayanan, an engineer at AI applications platform Replicate, who used a Replicate-hosted implementation or “inference” of Qwen to reskin a photo of Manhattan to look like a toy Lego set.
Appearance editing focuses on precise, local changes. In these cases, most of the image remains unchanged while specific objects are altered. Demonstrations include adding a signboard that generates a reflection in water, removing stray hair strands from a portrait, and changing the color of a single letter in a text image.
One good example of appearance editing with Qwen-Image Edit comes from AnswerAI co-founder and CEO Thomas Hill who posted a side-by-side on X showing his wife in her wedding dress below an archway and another with the same archway covered with graffiti:
Combined with Qwen’s established strength in rendering Chinese and English text, the editing-focused system is positioned as a flexible tool for creators who need more than simple generative imagery.
The dual control over semantic scope and appearance fidelity means the same tool can serve very different needs, from creative IP development to production-level photo retouching.
Adding or removing text to images
Another standout capability is bilingual text editing. Qwen-Image-Edit allows users to add, remove, or modify text in both Chinese and English while preserving font, size, and style.
This expands on Qwen-Image’s reputation for strong text rendering, particularly in challenging scenarios like intricate Chinese characters.
In practice, this allows for accurate editing of posters, signs, T-shirts, or calligraphy artworks where small text details matter, as seen in another example from Replicate below.
One demonstration involved correcting errors in a piece of generated Chinese calligraphy through a step-by-step chained editing process.
Users could highlight incorrect regions, instruct the system to fix them, and then further refine details until the correct characters were rendered. This iterative approach shows how the model can be applied to high-stakes editing tasks where precision is essential.
Applications and use cases
The Qwen team has highlighted a range of potential applications:
Creative design and IP expansion, such as generating mascot-based emoji packs.
Advertising and content creation, where logos, signage, and text-heavy visuals can be customized.
Virtual avatars and art, with style transfer supporting unique character representations.
Photography and personal use, including background adjustments, clothing changes, and object removal.
Cultural preservation, demonstrated through correcting classical calligraphy works.
By bridging fine-grained editing with broader creative transformations, Qwen-Image-Edit caters to professionals who need control while remaining approachable for casual experimentation.
Benchmarking and performance
According to the Qwen team, evaluations across public benchmarks indicate that Qwen-Image-Edit delivers state-of-the-art performance in image editing.
This follows from the broader Qwen-Image technical evaluations, where the base model achieved leading results in both general image generation and text rendering tasks.
While specific editing benchmark figures were not detailed in the release, Qwen-Image itself ranked highly in independent evaluations such as AI Arena, where human raters compared outputs across models from different providers.
API pricing and availability
Through Alibaba Cloud Model Studio, developers can access Qwen-Image-Edit as an API. Pricing is set at $0.045 per image, with a free quota of 100 images valid for 180 days after activation.
The service is initially available in the Singapore region, with a rate limit of five requests per second and up to two concurrent tasks per account.
To use the API, developers must obtain a Model Studio API key and can call the model via HTTP or through the DashScope SDK in Python or Java.
Images can be submitted as URLs or in Base64 format, with supported resolutions ranging from 512 to 4,096 pixels and file sizes up to 10 MB. Output images are hosted on Alibaba Cloud Object Storage with links valid for 24 hours, requiring users to download and save results promptly.
What’s next for Qwen?
Qwen positions Image-Edit as a step toward lowering barriers for visual content creation. By making precise, style-consistent editing more accessible, the model could support applications from design studios to casual users refining personal projects.
The system also signals a broader trend in AI development: moving beyond single-purpose generation toward tools that integrate editing, correction, and refinement.
With both semantic flexibility and appearance-level precision, Qwen-Image-Edit reflects this shift, blending the generative strengths of large models with the reliability required for professional editing.