DeepSeek Speculation Swirls Online Over Chinese AI Start-up’s Much-anticipated R2 Model

The latest speculation about DeepSeek-R2 – the successor to the R1, reasoning model, which was released in January – that surfaced over the weekend included the product’s imminent launch and the purported new benchmarks it set for cost-efficiency and performance.

That reflects heightened online interest in DeepSeek after it generated worldwide attention from late December 2024 to January by consecutively releasing two advanced open-source AI models, V3 and R1, which were built at a fraction of the cost and computing power that major tech companies typically require for large language model (LLM) projects. LLM refers to the technology underpinning generative AI services such as ChatGPT.

According to posts on Chinese stock-trading social-media platform Jiuyangongshe, R2 was said to have been developed with a so-called hybrid mixture-of-experts (MoE) architecture, with a total of 1.2 trillion parameters, making it 97.3 per cent cheaper to build than OpenAI’s GPT-4o.

MoE is a machine-learning approach that divides an AI model into separate sub-networks, or experts – each focused on a subset of the input data – to jointly perform a task. This is said to greatly reduce computation costs during pre-training and achieve faster performance during inference time.

In machine learning, parameters are the variables present in an AI system during training, which helps establish how data prompts yield the desired output.

02:51

South Korea says DeepSeek sent data to ByteDance-owned servers in China without consent

Source link

What's Hot

Is the AI bubble about to pop? Sam Altman is prepared either way.

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models – Takara TLDR

Fine-tune OpenAI GPT-OSS models using Amazon SageMaker HyperPod recipes

DeepSeek speculation swirls online over Chinese AI start-up’s much-anticipated R2 model

DeepSeek V3.1 Is Here, But It’s No Match for GPT-5 or Claude Opus

DeepSeek Launches V3.1 AI Model Optimised For Chinese Chips

DeepSeek Pushes Out V3.1 Update as Nvidia Dominates AI Hardware

Czech Man Sues Christie’s For Information on Nazi-Looted Artworks

Tanya Bonakdar Gallery to Close Los Angeles Space

Ancient Silver Coins Suggest New History of Trading in Southeast Asia

Sasan Ghandehari Sues Christie’s Over Picasso Once Owned by a Criminal

Is the AI bubble about to pop? Sam Altman is prepared either way.

From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models – Takara TLDR

Fine-tune OpenAI GPT-OSS models using Amazon SageMaker HyperPod recipes

What's Hot

DeepSeek speculation swirls online over Chinese AI start-up’s much-anticipated R2 model

Related Posts

Subscribe to Updates