Distilled AI Runs On A Single GPU

The next big thing from DeepSeek isn’t here yet. That’s DeepSeek R2, which is in development and should bring notable performance improvements. But like OpenAI, Google, and other AI firms, the Chinese startup continues to upgrade the models it released publicly in recent months.

DeepSeek R1 is one of those models. It’s a reasoning AI that DeepSeek released in early 2025, shaking up the AI stock market. In case you forgot, DeepSeek managed to train a frontier AI model as good as ChatGPT o1 without access to the latest Nvidia hardware used by US AI firms.

DeepSeek relied on software innovations to make up for its hardware limitations, and DeepSeek R1 became a hit AI app overnight. The company also launched its AI models as open-source, allowing users to install them on their own devices and run them locally without needing an internet connection.

Open-sourcing DeepSeek helped its AI models spread even faster. At the same time, access to an open-source version of DeepSeek R1 helps prevent user data from reaching Chinese servers and lets researchers bypass some of the built-in censorship found in web and mobile apps.

While I’ve advised caution when using AI models that involve heavy censorship or send user data to places like China, it’s ultimately your choice which models you want to use regularly.

If you’re a fan of the DeepSeek experience, you’ll be glad to know the Chinese startup just upgraded the R1 model and released a smaller, distilled version that only needs one GPU to run.

DeepSeek released the updated R1 model on Hugging Face this week, a platform well known in the AI world for offering a variety of new tools, including unreleased chatbots that are still in testing.

While DeepSeek hasn’t shared many details about the new R1 model, we know it has 685 billion parameters. That’s a large model requiring substantial resources to run. As TechCrunch explains, the full-size R1 needs around a dozen 80GB GPUs to run locally.

The updated model is expected to deliver better performance and reduce hallucinations, according to a post on WeChat. A similar description is available on DeepSeek’s website, although the company didn’t promote this release as heavily as before.

“The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic,” DeepSeek said, per Reuters.

The smaller version of R1 is even more exciting. The model name, DeepSeek-R1-0528-Qwen3-8B (Hugging Face link), reveals it’s a reasoning model released on May 28th, based on the Qwen3-8B model that Alibaba introduced in May.

Alibaba is one of a growing number of Chinese AI firms launching high-end models that directly compete with ChatGPT, Claude, and other US-developed AIs.

DeepSeek used the newly upgraded R1 model’s data to train the Qwen3-8B, creating the distilled version of R1.

As a reminder, DeepSeek stirred controversy when R1 debuted, with OpenAI accusing the startup of using ChatGPT data without permission to speed up R1’s training. OpenAI itself has also faced accusations of using data from sources without proper authorization for training its models.

What stands out about DeepSeek-R1-0528-Qwen3-8B is that it only requires a GPU with 40GB to 80GB of RAM to run. Nvidia’s H100 is a suitable example. This makes it easier for AI hobbyists and developers to experiment with DeepSeek R1 locally without hefty hardware costs.

The hardware requirements are impressive, especially given the power of the distilled DeepSeek R1 model.

Despite being a smaller version, this R1 model is performing well in benchmarks. DeepSeek-R1-0528-Qwen3-8B has outperformed Google’s Gemini 2.5 Flash in AIME 2025, a series of tough math problems.

The smaller DeepSeek R1 also nearly matches Microsoft’s Phi 4 reasoning model in HMMT math tests.

The only way to use the smaller R1 model, though, is by installing it on your own computer.

Source link

What's Hot

Legal Tech Investment Hits All-Time High With Filevine Funding

How Google’s dev tools manager makes AI coding work

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

Distilled AI runs on a single GPU

Gains and Risks for Enterprises With DeepSeek V3.1

DeepSeek unveils updated model in latest advancement towards AI agents

Exploring Next-Generation Large Model Development and Open Source Collaboration_The_model_open

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy

Legal Tech Investment Hits All-Time High With Filevine Funding

How Google’s dev tools manager makes AI coding work

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

What's Hot

Distilled AI runs on a single GPU

Tech. Entertainment. Science. Your inbox.

Related Posts

Subscribe to Updates