DeepSeek V3.1 Is Here, But It's No Match For GPT-5 Or Claude Opus

DeepSeek, the Chinese AI startup, has launched its new hybrid reasoning model called DeepSeek V3.1 which is designed for agentic use cases and tool calling. It comes with two modes: Think and Non-Think, and can automatically think for longer if the query requires more time to solve. The Think/Non-Think mode can be enabled using the “DeepThink” button.

The non-think mode uses deepseek-chat, and the thinking mode uses deepseek-reasoner. Both come with a context length of 128K tokens and activate 37B parameters, out of 671B parameters. For your information, the DeepSeek V3.1 Base is trained on 840B tokens, on top of V3. What is interesting is that DeepSeek V3.1 performs very well at multi-step reasoning tasks.

For instance, in SWE-bench Verified — a benchmark that tests coding performance on real-world software engineering tasks — DeepSeek V3.1 scored 66.0%, much higher than DeepSeek R1-0528 which got 44.6%. For reference, OpenAI’s GPT-5 Thinking scored 74.9% and Anthropic’s Claude Opus 4.1 achieved 74.5%.

In Humanity’s Last Exam (HLE), DeepSeek V3.1 achieved 29.8% with tool calling, and in GPQA Diamond, the new V3.1 model scored 81%. Overall, it seems the new DeepSeek V3.1 model is better than its earlier R1-0528 AI model. However, it doesn’t outperform GPT-5 or Claude 4 models. As for API pricing, the DeepSeek V3.1 costs $0.56 / $1.68 for input/output per 1 million tokens.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Source link

What's Hot

Beyond Von Neumann: Toward a unified deterministic architecture

A 19-year-old nabs backing from Google execs for his AI memory startup, Supermemory

OpenAI DevDay 2025: Opening Keynote with Sam Altman

DeepSeek V3.1 Is Here, But It’s No Match for GPT-5 or Claude Opus

How Deepseek 3.2 Reduces Costs While Boosting AI Performance

DeepSeek Launches V3.2-Exp, Targets Cost and Long-Text Performance

DeepSeek AI Models Are Unsafe and Unreliable, Finds NIST-Backed Study

Morning Links for October 6, 2025

Sotheby’s to Sell René Magritte Held in Same Collection for 100 years

Former ARTnews Publisher Dies at 97

National Gallery of Art Closes as a Result of Government Shutdown