On September 23, news broke that Alibaba Cloud has released and open-sourced the brand new Qwen3-Omni, Qwen3-TTS, and the Qwen-Image-Edit-2509, which is comparable to Google’s Nano Banana image editing tool.
Qwen3-Omni is the industry’s first native end-to-end multimodal AI model, capable of handling various types of input including text, images, audio, and video. It can output results in real-time through text and natural speech, solving the long-standing issue of needing to make trade-offs between different capabilities in multimodal models.
Qwen3-TTS-Flash is a new type of text-to-speech model that redefines voice AI. It is suitable for SOTA multilingual WER in Chinese, English, Italian, and French, offering 10 languages with 17 expressive voices, and supports over 9 Chinese dialects, including Cantonese, Hokkien, and Sichuanese. The official claim is that it is very suitable for applications, games, IVR, and any content that requires natural, human-like speech.
Qwen-Image-Edit-2509 is an image processing model that can handle single images or can integrate返回搜狐,查看更多
平台声明:该文观点仅代表作者本人,搜狐号系信息发布平台,搜狐仅提供信息存储空间服务。