Alibaba’s New Speech Recognition Model Pushes Accuracy But Keeps Weights Closed

On September 8, 2025, Alibaba’s Qwen team introduced Qwen3-ASR Flash, an automatic speech recognition (ASR) system covering 11 languages — as well as multiple dialects and accents — and a range of acoustic conditions, positioned as an all-in-one transcription service.

Unlike conventional systems that require separate models for different languages or conditions, Qwen3-ASR Flash consolidates capabilities into a single API-based model. It supports Mandarin, Cantonese, Sichuanese, Hokkien, Wu, and English (with British and American accents), alongside French, German, Spanish, Italian, Portuguese, Russian, Japanese, Korean, and Arabic.

The system is also trained to reject non-speech inputs such as background noise or silence, a feature that enhances transcription accuracy in real-world use cases.

A distinguishing aspect of Qwen3-ASR Flash is its ability to integrate contextual biasing directly into recognition. Users can provide background text in almost any format, from keyword lists to full paragraphs or entire documents, to steer the model toward domain-specific terms and phrasing. “Qwen3-ASR-Flash eliminates the need for preprocessing of contextual information,” Alibaba noted.

The model is also designed to maintain accuracy under difficult conditions, including far-field recordings, noisy environments such as vehicles or crowded venues, and even music-heavy scenarios. In fact, one of its more unusual capabilities is the ability to transcribe singing voices with background music, a task that has challenged earlier systems.

2025 Cover Slator Pro Guide Translation AI

2025 Slator Pro Guide: Translation AI

The 2025 Slator Pro Guide Translation AI presents 15 impactful ways that AI can be used to enhance translation workflows.

Alibaba demonstrated the model’s range with examples from rap lyrics, match commentary, chemistry lectures, and multilingual speech. Benchmarks place its word error rate below 8%, including in music and noisy environments, surpassing Google’s Gemini-2.5-Pro, OpenAI’s GPT-4o-Transcribe, and ByteDance’s Doubao-ASR.

You can test it here: https://huggingface.co/spaces/Qwen/Qwen3-ASR-Demo

The launch has drawn attention from users on Reddit, X, where users praised the performance. However, some noted that it is currently API-only, with no open-weight release, describing it as a “tough sell” compared to Whisper.

As one Reddit user observed: “Unless this model provides word-level timestamps, diarization, or confidence scores, it’s going to be a tough sell.”

Alibaba has not confirmed whether Qwen3-ASR will follow other Qwen models with an eventual open release. For now, the system reflects a state-of-the-art but closed deployment model, contrasting with open alternatives like OpenAI’s Whisper, Nvidia’s Parakeet, or Mistral AI’s Voxtral.

Source link

What's Hot

MIT Students 3D-Print Functional Objects from Food Waste

WorkFusion, With Several Big Banks As Customers, Lands $45M For AI Agents ‘To Stop Bad Actors’

Google launches new protocol for agent-driven purchases

Alibaba’s New Speech Recognition Model Pushes Accuracy But Keeps Weights Closed

Why Qwen3 Next Is the Most Efficient AI Model Yet

Alibaba releases record-setting next-generation Qwen3 model to cut costs by 90%

Top Free & Paid AI Alternatives to Try in 2025 News24 –

David Lynch’s Los Angeles Home and Studio on Sale for $15 M.

Picasso Inspires Name of Newly Discovered Microsnail

Rare Hieroglyphic Decree Identified in Egypt

Bristol Museum Requires $5.4 M. in Repairs for 120-Year-Old Home

MIT Students 3D-Print Functional Objects from Food Waste

WorkFusion, With Several Big Banks As Customers, Lands $45M For AI Agents ‘To Stop Bad Actors’

Google launches new protocol for agent-driven purchases

What's Hot

Alibaba’s New Speech Recognition Model Pushes Accuracy But Keeps Weights Closed

2025 Slator Pro Guide: Translation AI

Related Posts

Subscribe to Updates