Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

Foundation AI: Cisco launches AI model for integration in security applications

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » DeepSeek and China’s AI Innovation in US-China Tech Competition
DeepSeek

DeepSeek and China’s AI Innovation in US-China Tech Competition

Advanced AI BotBy Advanced AI BotApril 11, 2025No Comments9 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


At first glance, the following statements may seem like they come from the harshest critics of China’s technological innovations:

“We often say that the gap between China and the United States in AI is one or two years. But the real gap between the U.S. and China AI is creativity and imitation. China will be a follower forever if this doesn’t change.”

“In the past 30 years, China has essentially not produced any innovation in the tide of IT development, merely following along as a free rider, without contributing to any real technological innovation.”

“Chinese companies are accustomed to taking other (foreign) companies’ innovation, developing applications based on those, and making a fortune from it. But this should not be taken for granted.”

“We have been used to waiting for Moore’s Law to come down from the sky, and then, boom, 18 months later, we have better hardware and software to use. Now, in China, the same is happening with scaling laws.”

But surprisingly, these words come from Liang Wenfeng, the founder of DeepSeek, a Chinese AI start-up that recently shocked the global AI community, particularly in Silicon Valley and on Wall Street.

DeepSeek’s success marks a significant boost for China’s AI innovation. It shows that even in the face of US chip restrictions, Chinese companies can adopt innovative solutions to drive cost-effective development. Their work challenges the notion that China will always be a follower in AI innovation.

Taken from an exclusive interview with Liang conducted by 36Kr, a Chinese media platform, in summer 2024 — when DeepSeek released its V2 model — these quotations strike at the heart of long-standing issues in China’s technological innovation system and the way that many Chinese companies approach business.

So, what makes Mr. Liang, a seemingly “nouveau riche” player in China’s AI industry, so undiplomatically frank in his criticism? What sets DeepSeek apart from other AI giants and start-ups in China? Before answering these questions, we need to explore what DeepSeek has actually done to achieve its breakthroughs in AI innovation. What do DeepSeek’s innovations mean for the future of AI development?

Two Myths about DeepSeek’s Success

DeepSeek sent shockwaves through the global AI industry in January 2025 when it announced that its V3 model rivals OpenAI’s GPT-4o and other leading large language models (LLMs), despite the fact that the system was trained at an extremely low cost — US$5.576 million using 2,048 Nvidia A800 chips. This figure pales in comparison to the pre-training costs of around US$40–60 million and the tens of thousands — sometimes even 100,000 — of advanced AI chips (such as the Nvidia H100) used by OpenAI, Meta and other US-based tech giants.

This news drew a sharp reaction from Wall Street, temporarily sinking Nvidia’s stock price as investors feared reduced demand for high-end AI chips (GPUs). Two months later, the reasons for DeepSeek’s success are more clear.

Myth #1: DeepSeek’s cost was just $5.576 million. That $5.576 million accounted only for the cost of GPUs used for the final stage of training. SemiAnalysis, a research and analysis company, estimates that DeepSeek used up to 50,000 H-series chips in earlier training stages, putting its total AI investment at greater than US$1.3 billion. Moreover, its parent company, High-Flyer, a hedge fund also owned by Liang, had stockpiled 10,000 Nvidia A100 chips before US sanctions took effect in October 2022, making it the only company capable of training LLM beyond China’s few top tech giants at the time. The move was not driven by business foresight but rather by curiosity regarding AI and artificial general intelligence (AGI), according to another interview Liang had with 36Kr in May 2023. DeepSeek later purchased additional Nvidia chips, including H800s, H20s and even some H100s, through various channels.

Myth #2: DeepSeek has overturned the trajectory of AI development. Like OpenAI and Google, DeepSeek follows the “deep learning + foundation models” approach, relying on massive data sets, computational power and advanced algorithms (specifically, transformer neural network architecture) to train models that it believes could eventually reach AGI.

All that being said, DeepSeek has made legitimate, important breakthroughs.

Its most notable achievement is its remarkable cost reduction, enabling the training of LLMs comparable to the most advanced models from US companies at a fraction of the cost through highly impressive innovative optimization and engineering in model architecture, training frameworks and algorithms. Although the upfront investment is costly, DeepSeek’s optimizations for cost-effective AI model training are real — its V3 model training costs are just one-tenth of OpenAI’s GPT-4, which was estimated to be US$63 million. Key optimizations that reduced reliance on expensive hardware include:

Mixture of experts (MoE) and multi-dead latent attention (MLA): The optimization of MoE and MLA architectures was critical for the DeepSeek-V3 model to achieve efficient inference and cost-effective training.Think of a large AI model as a team of specialists, each trained to handle different tasks. Instead of using the entire team for every problem, DeepSeek’s MoE architecture pushed MoE’s potential to a new level in only activating the specialists (or “experts”) that are needed for a specific task, reducing unnecessary computations. MLA was regarded as the key innovation for DeepSeek-V3 to significantly reduce the key-value cache, which stores lots of temporary memory for processing information while also slowing down the training of AI models; reducing the cache has resulted in optimizing inference speed and computational efficiency.

Parallel thread execution (PTX) programming: PTX programming is an intermediate instruction set architecture designed by Nvidia for its GPUs. By reconfiguring Nvidia’s H800 chips at the software level to increase the connectivity efficiency between multiprocessors, DeepSeek unlocked new levels of AI compute efficiency.

Multi-token prediction: A novel approach to model training, multi-token prediction allows the system to predict multiple upcoming tokens simultaneously, increasing data throughput by two to three times compared to standard next-token prediction. A token is a component of text that an AI model processes at a time, which can be a word, a part of a word, a single character or a phrase, depending on the language and context.

FP8 mixed-precision training: Reducing training costs by leveraging 8-bit floating-point precision (FP 8) rather than the standard 16-bit (FP16) allows for faster computations with minimal loss in model accuracy. Bits are tiny units of computer memory, and 8-bit floating-point precision is a way of storing numbers using only 8 bits, which helps AI models do math faster and use less memory while keeping accuracy high. Using 16 or 32-bit means even more accuracy but also more computing power.

Model/knowledge distillation: Model/knowledge distillation is a compression technique that transfers knowledge from a large “teacher” model to a smaller “student” one without significantly degrading performance. This approach is used to compress massive neural networks, improving efficiency. A neural network refers to a method in AI that teaches computers to process data in a way that is inspired by the human brain.

Finally, group relative policy optimization (GRPO) is the main innovation in DeepSeek-R1 model. This is a reinforcement learning (RL) algorithm that enhances reasoning capabilities. Unlike traditional RL methods such as Proximal Policy Optimization, which rely on external critics that are separate evaluation models to judge an AI’s responses, GRPO evaluates groups of responses relative to one another, improving response quality.

Implications for the Future of AI

While DeepSeek’s innovations are not purely original, but instead based on the optimization of existing technologies, they do represent remarkable progress in AI development: DeepSeek’s ability to optimize the cost efficiency of LLM training makes it a game-changer. These optimizations significantly lower the threshold for AI model training, making advanced AI technology more accessible for businesses, start-ups and developers worldwide.

Further, because its technology is open-source, DeepSeek is making these innovations freely available, further democratizing AI development and encouraging innovation. This shift could lead to a more inclusive era of AI development featuring cost-effective and scalable machine learning, fewer monopolies by tech giants and greater participation from businesses globally.

The Impact on China’s AI Innovation

DeepSeek’s breakthrough was a shot in the arm for China’s AI innovation, encouraging developers, start-ups and investors to double down on creative and cost-effective solutions for AI development and applications in various sectors. DeepSeek’s homegrown young engineers have demonstrated exceptional skill in optimizing existing technologies facing the US restrictions on advanced AI chips. This outcome reinforces the saying that “necessity is the mother of invention,” and ironically calls into question the effectiveness of US sanctions in limiting China’s AI progress.

Before DeepSeek, there was considerable pessimism in China regarding its ability to lead on AI. While China still lags behind the United States in its approach to AI development, DeepSeek demonstrates that fostering an environment of curiosity-driven innovation — rather than simply chasing profit — can lead to original technological breakthroughs.

As Liang said in an interview with 36Kr in summer 2024, “We did not intend to become the catfish that caused the catfish effect in the first place; it happened by accident.”

That being said, two key points must be considered when assessing DeepSeek’s success. First, as noted above, its achievements are based on optimizing existing AI approaches rather than developing entirely new paradigms. Second, DeepSeek’s free-style, curiosity-driven innovation is somewhat unique in China. It stands out due to Liang’s passion and geek-like working style — with his day job of writing codes, reading papers and participating in group discussion — reminiscent of the early days of Bill Gates and Steve Jobs.

Liang’s goal of developing AGI, in line with global leaders such as OpenAI and Sam Altman, sets DeepSeek apart from other AI companies in China. He believes that the most important thing for Chinese companies is to participate in the global wave of innovation and technological progress rather than focusing solely on short-term financial gain. This belief reflects a hope for more genuine, original innovation in China.

Liang’s comments about China as a follower in innovation echo the words of economist Zhang Weiying, who argued in 2017 that the country’s rapid economic growth in recent decades was built on technology and products made by advanced Western countries over the past 500 years, a period when China did not produce any real innovations. He argued that China’s future of innovation will depend on market-driven entrepreneurship. By contrast, another nationalist scholar, Zhang Weiwei, claims that China is leading the Fourth Industrial Revolution and setting a global benchmark in innovation in many sectors. He argues that China’s model of development can compete with Western models, and sometimes even surpasses them.

The future of China’s innovation will depend on which of these perspectives Chinese policy makers ultimately choose to embrace.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleAlibaba Bets Big On AI With New Cloud Tools As Jack Ma Reemerges In Company Comeback – Alibaba Gr Hldgs (NYSE:BABA)
Next Article What’s up with… Mistral AI, telco AI, MTN, Digital Platforms and Services
Advanced AI Bot
  • Website

Related Posts

China’s Industrial Policy Faces Productivity Challenges Despite BYD, DeepSeek Success

June 7, 2025

China’s Industrial Policy Faces Productivity Challenges Despite BYD, DeepSeek Success

June 7, 2025

DeepWho? If You Missed DeepSeek’s Latest AI Launch, You’re Not Alone.

June 7, 2025
Leave A Reply Cancel Reply

Latest Posts

Hugh Jackman And Sonia Friedman Boldly Bid To Democratize Theater

Men’s Swimwear Gets Casual At Miami Swim Week 2025

Original Prototype for Jane Birkin’s Hermes Bag Consigned to Sotheby’s

Viral Trump Vs. Musk Feud Ignites A Meme Chain Reaction

Latest Posts

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

June 7, 2025

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

June 7, 2025

Foundation AI: Cisco launches AI model for integration in security applications

June 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.