Daniel A. Keller, CEO and President of InFlux Technologies Limited. Cofounder of Flux.
When ChatGPT was released by OpenAI in 2022, it was the peak expression of AI chatbots built on large language models (LLMs). With an accessible interface and absolutely no need for external gadgets, it was the power of interactive AI in the palms of users, literally!
Barely five days after its launch, ChatGPT broke the 1 million download milestone. (For context, that took Facebook 10 months to achieve.) Of course, there were a few problems, like the occasional lags and hallucinations, but version after version, ChatGPT continued to expand its frontiers.
There were also apprehensions about the development cost of ChatGPT-4, somewhere between $48 to $71 million. But it was all completely justifiable. Sixteen thousand H100s GPUs don’t come cheap, and salaries have to be paid.
Or was it?
Rise Of The Deep
On January 20, 2025, the world woke up to news that would change the trajectory of AI technology. A little-known Chinese company had launched DeepSeek R1, an AI with capabilities comparable to OpenAI’s ChatGPT.
And the shocker?
The initial reports claimed it did it with fewer, cheaper and older GPUs at a development cost of only $5.6 million. The ripple effect sent shock waves across the markets. By Monday, Nvidia, the biggest supplier of AI GPU chips, lost almost $600 billion in market value as investors started reconsidering their options. Indexes and corporations like Nasdaq, Microsoft and Alphabet also plummeted. Within a week, Deepseek had overtaken ChatGPT to become the most downloaded application on the Apple App Store.
But since then, DeepSeek has come under scrutiny, with the head of Google’s DeepMind calling its claims “exaggerated” and one critic suggesting it actually cost DeepSeek over $1 billion to create its AI model.
Nevertheless, DeepSeek’s arrival has caused a shift. The investment rationale for the supply chain had been quite simple: more spending and better outcomes for AI.
Until now.
The Paradigm Shift
Deepseek’s story is exceptional for several reasons. First, due to the United States’ efforts to stem the flow of advanced AI technology to competing nations, the Biden administration restricted the export of GPUs to China, limiting the availability of advanced AI GPUs like the A100s and the H100s. As a result, Deepseek presumably had to rely on less sophisticated but more available GPUs like the H800.
The ability of Deepseek to turn this crippling limitation into one of the marvels of AI innovation highlights a very critical question: Is ingenuity and better software architecture a more sustainable alternative to advanced but expensive GPUs?
GPU availability (significantly advanced chips like the H100s) is one of the rate-limiting steps for AI research and development; even in the U.S., Nvidia, the top producer of GPUs globally, continues to grapple with meeting its high demand. A breakthrough that demonstrates that companies and research labs can maximize their computing power and cut down costs is a game-changer for the entire industry, but how exactly did DeepSeek achieve this?
Flipping The Game
Before Deepseek’s emergence in AI, it had always been a game of who was bigger. Bigger financial investments translate into bigger LLM Models, which in turn require more compute resources and, hopefully, bigger innovative strides.
However, DeepSeek’s approach was counterintuitive. Instead of slapping on more compute and developing bigger models, the Chinese company focused on optimizing for a more efficient use of available resources. This included enhancing its model abilities through reinforcement learning, leveraging improved software architecture and optimizing its algorithm.
Rather than dwarfing prevailing challenges with sheer brute power, Deepseek turned the game on its head. Early benchmarks showed it was 20 times more efficient and far less compute-intensive than its more pronounced competitors.
Since it relied on reinforcement learning, Deepseek-R1 also eliminated the need for large teams of human reviewers and supervised fine-tuning, keeping operating costs to a minimum.
Another important paradigm that Deepseek adopted was its incorporation of MOE (mixture of experts) architecture. MOE leverages multiple expert sub-models and uses selective gating to activate only the most relevant parameters for each input. For context, the Deepseek MoE framework comprises around 671 billion parameters; however, less than 0.5% of these parameters are used during any input.
Picture a diverse team of seasoned experts across different disciplines. When needed, the gating mechanism dynamically selects the best combination of experts to solve the problem.
The result?
Dynamic routing and allocation lowers the amount of computation the model requires by reducing unnecessary computation. This approach also improves efficiency, promotes seamless scalability and supports progressive fine-tuning of different expert system components for specific problems.
Implications For The Broader AI Industry
Compute-efficient AI solutions encourage democratization, allowing for dynamic innovations from different quarters. This could, in turn, promote cheaper access to AI resources, breaking Big Tech’s monopoly on AI innovation.
Deepseek’s open-source nature provides a level playing field for researchers to engage in deep R&D without breaking the bank. Its lower energy requirements and smaller carbon footprint can also positively drive environmentally sustainable designs for data centers in the near future.
However, as revolutionary as the emergence of Deepseek has been, there are also a few drawbacks (on top of the dubiousness of its claims).
First, while DeepSeek’s open-source nature encourages technology sharing and participation, it also means malicious actors can repurpose it, raising fresh concerns about heightened misinformation, deepfakes and other sinister possibilities.
Another danger hinges on data sovereignty and the possibility of the Chinese government mining users’ data.
Rounding Off
While DeepSeek has demonstrated capabilities that are comparable to OpenAI ChatGPT in many ways, its long-term effect on repositioning AI technology, compute and market dynamics still remains to be seen.
Whatever the future might hold, Deepseek’s successful deployment of a powerful open-source model has introduced a new level playing field for innovation in the AI industry. As this distills into the mainstream, its ripple effect could determine the face of the next iteration of artificial intelligence.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?