Alibaba Unveils New Open Source AI That Creates Images With Perfect Text

Alibaba has released a new open-source image generation model called Qwen-Image that sets itself apart by accurately rendering complex and multilingual text within images, a task where many other AI tools still struggle. Developed by Alibaba’s Qwen Team, Qwen-Image is designed to handle everything from handwritten poetry and bilingual posters to e-commerce product labels and classroom diagrams, all while maintaining high-quality, readable text. The model supports both alphabetic scripts, like English, and logographic ones, like Chinese, making it especially useful in multilingual contexts.

Users can try out Qwen-Image via the Qwen Chat website by switching to the “Image Generation” mode. The model has also been released under the Apache 2.0 licence, meaning businesses and developers can use, modify, and distribute it — even for commercial purposes — as long as they include the proper attribution.

Qwen-Image’s training data includes billions of image-text pairs sourced from natural scenes, human portraits, artistic posters, and synthetically generated text data. Interestingly, all the synthetic data used for training was generated in-house by Alibaba, and no AI-generated images from other models were included. This approach helped the model learn to handle rare or complex characters, especially in Chinese.

The model was trained in stages, starting with simple captioned images and gradually moving to more complex layouts and dense multilingual text. This curriculum-style training, according to Alibaba, helped Qwen-Image generalise better across various formats.

Under the hood, Qwen-Image combines three main components:

-Qwen2.5-VL, a multimodal language model for understanding context

-A VAE encoder/decoder, optimised for high-resolution layouts

-MMDiT, a diffusion model with a special encoding system for spatial alignment

These elements work together to produce images that are not only visually appealing but also accurate in terms of text placement and formatting.

Alibaba claims that Qwen-Image has been tested against several industry benchmarks for text clarity, layout precision, and prompt-following ability. On the AI Arena public leaderboard, which uses human evaluations to rank AI image models, Qwen-Image reportedly holds third place overall currently and is the highest-ranked open-source model.

– Ends

Published By:

Nandini Yadav

Published On:

Aug 6, 2025

Source link

What's Hot

Jus Mundi – Jus AI 2, Arbitration Agent – Artificial Lawyer

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark – Takara TLDR

Google’s new Gemini 2.5 model gives AI agents control over web and mobile interfaces

Alibaba unveils new open source AI that creates images with perfect text

Thinking Machines debuts Tinker, a developer tool to simplify fine-tuning of AI models | Technology News

Mapping shifts in the geography of tech innovation: China becomes a big player in AI research

We Tested the Best Free AI Image Editors—Here’s What You’ll Love and Hate

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

Basquiat Work on Paper Headline’s Phillips’ Frieze Week Sales

Charges Against Isaac Wright ‘to Be Dropped’ After His Arrest by NYPD

What the Los Angeles Wildfires Taught the Art Insurance Industry

Jus Mundi – Jus AI 2, Arbitration Agent – Artificial Lawyer

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark – Takara TLDR

Google’s new Gemini 2.5 model gives AI agents control over web and mobile interfaces

What's Hot

Alibaba unveils new open source AI that creates images with perfect text

Related Posts

Subscribe to Updates