DeepSeek, the China-based AI lab, has released DeepSeek-V3.2-Exp, an experimental AI model on September 29. The model is claimed to achieve ‘significant efficiency improvements in both training and inference’.
It is built upon the DeepSeek-V3.1-Terminus, which itself is an upgraded version of the DeepSeek-V3.1 model.
It introduces what is called ‘DeepSeek Sparse Attention (DSA)’, a sparse attention mechanism designed to explore and validate optimisations for training and inference efficiency in long-context scenarios, according to the company.
Despite using a much simpler and faster attention method that processes far fewer tokens during long-context tasks, DeepSeek revealed that it performs on par with V3.1-Terminus.
For context, this model scored 58 on the Artificial Intelligence index, which incorporates the performance of an AI model across 10 benchmarks in diverse domains. Anthropic’s Claude 4.1 Opus model scores 59, Gemini 2.5 Pro scores 60, and OpenAI’s GPT-5 (high) scores 68.
For more details on the architecture, refer to the technical report, available here.
“The DeepSeek team cracked cheap long context for LLMs: a ~3.5x cheaper prefill and ~10x cheaper decode at 128k context at inference with the same quality,” said Deedy Das, partner at Menlo Ventures, reacting to the announcement on X.
The model is available on the DeepSeek app, web and API. The model’s weights are available on Hugging Face.
The company also announced that the API pricing has been cut by 50%. DeepSeek has reduced input costs from $0.07 to $0.028 per 1M tokens for cache hits and from $0.56 to $0.28 for cache misses, while output costs have dropped from $1.68 to $0.42.
“This experimental release represents our ongoing research into more efficient transformer architectures, particularly focusing on improving computational efficiency when processing extended text sequences,” said DeepSeek in the blog post.