Alibaba Launches Qwen3-VL With Open Source Flagship Model

Alibaba’s Qwen team introduced the Qwen3-VL series on September 23, calling it the most advanced vision-language line in its portfolio. The release includes the open-sourcing of its flagship model, Qwen3-VL-235B-A22B, in both Instruct and Thinking versions. The focus is on moving visual AI from simple recognition towards deeper reasoning and execution.

The models are designed to combine text and visual understanding at scale, with native support for 2,56,000 tokens of context, expandable to one million. This allows processing of entire textbooks or hours of video while maintaining near-perfect recall.

Benchmarks cited by the company show that the Instruct model matches or surpasses Gemini 2.5 Pro in visual perception. Meanwhile, the Thinking model outperforms it on complex math tasks such as MathVision.

Performance upgrades are attributed to three architectural changes. An interleaved MRoPE positional scheme distributes temporal and spatial information more evenly.

DeepStack technology injects visual features into multiple LLM layers, improving detail capture and text-image alignment. A new text-timestamp alignment method enhances video temporal reasoning, enabling more accurate event localisation.

The system’s capabilities extend beyond perception. Qwen3-VL can act as a visual agent by navigating GUIs, converting sketches into code or executing fine-grained 2D and 3D object grounding. Its OCR now spans 32 languages, with higher accuracy under challenging conditions and better handling of long, complex documents.

The company said the open release aims to serve as a foundation for community exploration, framing Qwen3-VL as both a research tool and a step towards embodied AI systems. The project positions the series as a competitive alternative to closed-source leaders while expanding open access to multimodal reasoning technology.

Recently, Alibaba unveiled Qwen3-Next, a new LLM architecture combining hybrid attention and sparse MoE for ultra-long context efficiency. With faster throughput and reasoning strengths, Qwen3-Next powers two post-trained models, edging towards Qwen3.5 advancements.

Source link

What's Hot

AMD and Cohere Expand Global AI Collaboration to Power Enterprise and Sovereign Deployments with AMD AI Infrastructure

Google Trends: From Perplexity AI to OPPO A6 Pro 5G, check what’s trending on Google – Technology News

If Lawyers Don’t Trust AI, They Won’t Use It – Artificial Lawyer

Alibaba Launches Qwen3-VL With Open Source Flagship Model

Alibaba launches Qwen-3 Max, its most powerful AI model yet to rival ChatGPT and Gemini: Here’s how to start using

Alibaba Shares Soar After Hiking AI Budget Past $50 Billion

Alibaba Launches Qwen3-Max, Its Most Advanced AI Model Yet

Art Dealer Mary Boone Says Prison Was ‘Very Relaxing’

New Research Supports Theory of Hidden Vermeer Self-Portrait

John Singer Sargent Paintings Expected to Bring In $12-15 Million

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

AMD and Cohere Expand Global AI Collaboration to Power Enterprise and Sovereign Deployments with AMD AI Infrastructure

Google Trends: From Perplexity AI to OPPO A6 Pro 5G, check what’s trending on Google – Technology News

If Lawyers Don’t Trust AI, They Won’t Use It – Artificial Lawyer

What's Hot

Alibaba Launches Qwen3-VL With Open Source Flagship Model

Related Posts

Subscribe to Updates