Cerebras Systems has officially launched Qwen3‑235B, a cutting-edge AI model with full 131,000-token context support, setting a new benchmark for performance in reasoning, code generation, and enterprise-scale AI applications. Now available via the Cerebras Inference Cloud, the model delivers capabilities that rival the most advanced frontier systems—while operating at 30 times the speed and one-tenth the cost of today’s leading closed-source models.
Real-Time AI Reasoning Reaches Breakthrough Speed
AI reasoning has historically been slow, with large models often taking a minute or more to respond to complex queries. Cerebras eliminates this bottleneck. Powered by its proprietary Wafer-Scale Engine 3 (WSE‑3), Qwen3‑235B achieves 1,500 tokens per second, a world record for frontier AI inference.
This level of performance transforms user experience—cutting latency from 60–120 seconds to just 0.6 seconds, even when processing advanced reasoning tasks or running multi-step workflows like retrieval-augmented generation (RAG). According to benchmark results from Artificial Analysis, no other provider—open or closed—currently matches this inference speed for a frontier-level model.
A New Standard in Context: 131K Tokens for Real-World Applications
In tandem with this release, Cerebras has expanded its model context window from 32K to the full 131K tokens supported by Qwen3‑235B. This leap in context size enables the model to ingest and reason over massive volumes of data—spanning full codebases, multi-document repositories, and long-form technical material.
While 32K context allows for basic generation tasks, 131K opens the door to production-grade development. AI can now function as a deep code collaborator, synthesizing dozens of files and managing complex dependencies in real time—making it ideal for enterprise use cases in software engineering, document analysis, and scientific computing.
Why Cerebras Is Different: Hardware Designed for AI from the Ground Up
Founded by a team of pioneering computer architects, AI researchers, and systems engineers, Cerebras Systems has taken a radically different approach to solving the challenges of scaling generative AI. Rather than relying on GPU clusters, Cerebras built a purpose-specific AI supercomputer around its own chip: the Wafer-Scale Engine 3.
This chip is unlike anything in the industry. It spans the size of a dinner plate and houses hundreds of thousands of AI-optimized cores with on-chip memory measured in tens of gigabytes. That design allows the compute and memory to sit side-by-side—eliminating the latency, bandwidth constraints, and orchestration complexity of traditional multi-GPU solutions.
By clustering its CS-3 systems, Cerebras can create AI supercomputers capable of running trillion-parameter models with ease, all while avoiding the technical burden of distributed computing. This unified approach is the foundation of its record-setting inference speeds and the enabler of new high-context capabilities.
MoE Efficiency Enables Dramatic Cost Reduction
Qwen3‑235B is built using a mixture-of-experts (MoE) architecture—a model design that activates only a subset of internal experts depending on the input, resulting in vastly improved computational efficiency.
Because of this, Cerebras can offer the model at a price point that significantly undercuts closed-source alternatives. Specifically, inference is available at $0.60 per million input tokens and $1.20 per million output tokens, representing more than a 90% reduction in cost compared to proprietary models from OpenAI, Anthropic, or Google.
Seamless Integration with Cline in VS Code
To showcase Qwen3‑235B’s speed and real-world application, Cerebras has partnered with Cline, the leading agentic coding agent inside Microsoft Visual Studio Code, currently installed by over 1.8 million developers.
Cline users can already access Qwen3‑32B with 64K context as part of the free tier. With today’s announcement, support will expand to include Qwen3‑235B and its full 131K context, enabling a level of in-editor reasoning and code generation previously unattainable in real time.
Saoud Rizwan, CEO of Cline, explained, “With Cerebras’ inference, developers using Cline are getting a glimpse of the future, as Cline reasons through problems, reads codebases, and writes code in near real-time. Everything happens so fast that developers stay in flow, iterating at the speed of thought. This kind of fast inference isn’t just nice to have—it shows us what’s possible when AI truly keeps pace with developers.”
An Open Alternative to Closed Models
Qwen3‑235B’s performance is on par with some of the most sophisticated AI models on the market today, including Claude 4 Sonnet, Gemini 2.5 Flash, and DeepSeek R1, as validated by independent benchmarks. Yet Cerebras delivers it in an open-access format, offering transparency and portability that closed models lack.
This open model approach empowers enterprises to:
Customize and fine-tune on proprietary data
Deploy AI on private infrastructure or in hybrid cloud environments
Avoid vendor lock-in and data privacy risks
Combined with Cerebras’s scalable cloud platform and optional on-premise deployment of the CS-3 systems, organizations gain full control over how AI is applied to mission-critical tasks.
What This Means for the Industry
The launch of Qwen3‑235B marks a turning point. Cerebras has redefined the boundaries of what’s possible with large language models by combining frontier intelligence with unprecedented speed and cost efficiency.
It is now feasible to build AI tools that:
Respond in real-time to developer queries
Reason over full documentation or knowledge bases
Generate and debug production-level code live inside the IDE
Scale to enterprise use cases without GPU sprawl or six-figure monthly inference bills
As demand for AI infrastructure grows, Cerebras demonstrates that you don’t need to compromise between performance, price, and openness. Its hardware-software co-design, from the Wafer-Scale Engine to the Cerebras Inference Cloud, points toward a new model of AI deployment—one that is faster, simpler, and far more accessible.
Final Thought
By releasing Qwen3‑235B with 131K context, ultra-fast inference, and affordable token pricing, Cerebras Systems has positioned itself as one of the only true challengers to the GPU-driven incumbents. For enterprises, researchers, and developers, this launch is more than just a faster model—it’s an inflection point that brings real-time, production-grade, open AI within reach.