DeepSeek has released version V3.1 of its large language model, introducing a hybrid architecture that combines thinking and non-thinking modes in a single system. The thinking mode, named DeepSeek-V3.1-Think, is designed to deliver faster reasoning compared to the previous DeepSeek-R1-0528 model, while maintaining similar response quality. This update also improves tool use and multi-step task execution through additional post-training adjustments.
The development of DeepSeek-V3.1 builds on the DeepSeek-V3-Base checkpoint and follows a two-phase context extension strategy. The first phase extended the context window to 32,000 tokens using 630 billion tokens of training data. The second phase extended the context further to 128,000 tokens with an additional 209 billion training tokens. This approach enables the model to handle significantly longer input sequences compared to earlier versions.
Training for V3.1 also adopted FP8 UE8M0 precision for weights and activations. This format provides efficiency benefits and maintains compatibility with microscaling techniques, allowing for more efficient deployment of large-scale models. In terms of size, the full DeepSeek-V3.1 model contains 671 billion total parameters, with approximately 37 billion parameters activated per token, while supporting the extended 128,000-token context length.
DeepSeek V3.1 ranks near the top of open-source coding and reasoning benchmarks. In community tests, it scored 71.6% on the Aider benchmark, outperforming Claude 4 and approaching GPT-4, while running the full suite for about $1 in compute compared to several dozen dollars for other models.
Community discussions on Reddit and X reflect mixed opinions about DeepSeek V3.1. Some developers describe it as a cost-effective alternative to GPT or Claude, noting its strong results in coding and reasoning benchmarks for a fraction of the cost. User badgerbadgerbadgerWI quoted:
DeepSeek’s cost/performance ratio is insane. Running it locally for our code reviews now.
Meanwhile AI Enginer Prince Ramoliya shared:
Hybrid inference is brilliant. Having one model that can switch between deep thinking and quick responses feels like the future of practical AI.
The model is available through multiple platforms including Hugging Face, OpenRouter, and Replicate. It is also accompanied by an official set of API documentation and release notes that describe technical details and performance benchmarks. Developers can experiment with both standard response generation and reasoning-enhanced outputs, selecting between modes depending on the task requirements.
Compared to DeepSeek-V3, this version aims to balance efficiency with reasoning capability. By integrating tool use and structured post-training improvements, DeepSeek-V3.1 attempts to address challenges in multi-step reasoning tasks while keeping inference speed practical for production environments. The hybrid design reflects an effort to merge the benefits of explicit reasoning with the faster throughput of conventional autoregressive generation.