China’s Kimi K2 Could Be The Next DeepSeek Moment

China’s open-source AI scene is heating up again. After DeepSeek’s rapid rise earlier this year, a new challenger is making waves in the form of Kimi K2 from Moonshot AI.

Although it launches with less fanfare, Kimi K2 is now drawing serious attention from AI insiders and outperforming some of the biggest names in the game.

It’s fast, climbing the ranks, beating expectations on benchmarks, and sparking comparisons to DeepSeek’s breakout moment. Some even believe it’s strong enough to have made OpenAI rethink its release schedule.

“China’s Kimi K2 is having its mini DeepSeek moment: it is now #14 on OpenRouter today, ahead of Grok 4 and GPT-4.1,” Deedy Das of Menlo Ventures wrote in a post on X

He added that this is a non-reasoning model, yet it scores highest on major EQ and creative writing benchmarks. “Best model smell since (Claude) 3.5 Sonnet,” he said.

Based on current API pricing, Kimi K2 is roughly 80-90% cheaper than Claude Sonnet 4 when comparing per-token costs, especially for API usage.

The model is now available in preview on GroqCloud at 185 tokens per second.

Kimi K2 uses a sparse mixture‑of‑experts (MoE) design, featuring one trillion total parameters and 32 billion active ones per query. Of its 384 specialised expert subnetworks, only a few are activated dynamically based on the input. This setup lowers compute needs while preserving capacity. It also supports a 1,28,000-token context window.

As soon as the model was dropped, OpenAI CEO Sam Altman announced a delay in the release of their open-source model.

“Kimi mogged OpenAI, and I genuinely think the real reason they delayed the open-source model release is Kimi K2,” AI enthusiast Ashutosh Shrivastava wrote on X. He added that OpenAI “never saw this coming”. Kimi K2 outperforms DeepSeek V3 and goes head-to-head with Claude Opus 4 and GPT-4.1.

This comes against the backdrop of OpenAI naming another Chinese AI startup, Zhipu, as a potential threat to its dominance.

People I respect are speculating that OpenAI is pushing back their open-source release due to Kimi K2. It does strike me that this is what Llama 4 was supposed to be; a massive, impressive open-source MoE model that can form the basis for a new generation of agentic AI… pic.twitter.com/CxP4m6Lp38

— Chris Paxton (@chris_j_paxton) July 12, 2025

Kimi K2 delivered top-tier results in coding and math benchmarks. On SWE-bench Verified, it scored 65.8%, outperforming GPT-4.1 at 54.6% and coming close to Claude Sonnet 4. On LiveCodeBench, it achieved 53.7%, ahead of DeepSeek V3 (46.9%) and GPT-4.1 (44.7%).

In the Math-500 benchmark, it scored 97.4%, compared to GPT-4.1’s 92.4%. Kimi K2 also performs strongly across AIME, GPQA, OGBench, and tool-use evaluations.

Artificial Analysis said that while Moonshot AI’s Kimi K2 is the leading open-weight non-reasoning model in its Intelligence Index, it outputs roughly three times more tokens than other non-reasoning models, blurring the line between reasoning and non-reasoning.

As a non-reasoning model, it excels in creative tasks. It is now the Short-Story Creative Writing champion, scoring 8.56 and surpassing the previous leader, o3-pro, which scored 8.44.

Kimi-K2-Instruct now ranks #1 on EQ-Bench 3, a benchmark for emotional intelligence in LLMs. It leads GPT-4o, Claude, and Gemini across empathy, insight, and creative writing. pic.twitter.com/91amc3W9wB

— 👋 Jan (@jandotai) July 14, 2025

Agentic Capabilities

Kimi K2 has good agentic capabilities. According to the company, unlike traditional LLMs, Kimi K2 can plan and execute multi‑step tasks autonomously. It can call external APIs, generate and debug code, create plots, webpages and more, all without manual prompting at each step.

Kimi K2 one-shotted a web version of 3D Minecraft!

K2 is meticulously optimized for agentic capabilities. Designed for tool use and autonomous problem-solving.

It automatically understands how to use the tools and gets the job done. You don’t have to write any complex workflow… https://t.co/mmw5qlesJC pic.twitter.com/yHMS9A1YAN

— cedric (@cedric_chee) July 11, 2025

There are two versions of the model. While the Base variant is designed for research and fine-tuning, the Instruct variant is intended for use in chatbots and agents.

In a blog post, the company shared that Kimi K2’s agentic abilities are driven by two core components: large-scale tool-use training and general reinforcement learning (RL).

In order to teach the model how to use tools effectively, Moonshot AI built a large-scale synthetic data pipeline inspired by ACEBench. This system simulates real-world tool-use tasks across hundreds of domains and thousands of tools, combining both real and synthetic examples.

“Our approach systematically evolves hundreds of domains containing thousands of tools, including both real MCP (Model Context Protocol) tools and synthetic ones, then generates hundreds of agents with diverse tool sets,” the company said.

It Comes with Flaws

Despite the good benchmark figures, Ethan Mollick, a professor at Wharton, described Kimi K2 as “a really weird model” that still needs much more testing. He recounted an experiment where he gave it a slightly altered version of the novel The Great Gatsby.

Like Claude, the model spotted the two intentional changes, but then “made up a ton of hallucinated nonsense that sounded plausible but was just plain wrong”.

He added that the DeepSeek moment was largely fueled by pent-up consumer demand for high-quality free AI, especially among students looking for help with homework.

According to him, Kimi K2, despite its strong performance, hasn’t seen the same immediate public impact. One possible reason he observed is that for most consumers and students, “DeepSeek is good enough”.

“Feels like unlike DeepSeek, the general public hasn’t felt the effect/impacts of Kimi K2 yet – most non-technical people have probably never even heard of it. Wonder why it is being overlooked when DeepSeek got so much attention,” wrote a user on X.

Meanwhile, DeepSeek’s upcoming model, R2, is still unreleased, and it may be delayed further. A recent report suggests that US export restrictions on NVIDIA’s H20 chips, which are essential for training and deploying the model, could pose serious challenges in China.

Kimi K2 may not have the same hype DeepSeek had, but its performance is hard to ignore. With strong benchmarks and growing visibility, it is clear that China’s open-source push is far from over.

Source link

What's Hot

Are you AI ready? Growing skills gap demands a reality check

2D Gaussian Splatting with Semantic Alignment for Image Inpainting – Takara TLDR

80B Large Model Performance Soars, Inference Costs Plummet, A New Paradigm for AI Models?_the_brings_and

China’s Kimi K2 Could Be the Next DeepSeek Moment

Weekly Lecture Preview | Exploring DeepSeek and Library Applications_skills_The_coming

DeepSeek-R1 More Effective in Diagnosis, Management of Ophthalmic Subspecialties Compared With OpenAI

HPV-DeepSeek shows potential for early detection of head and neck cancer

Long-Lost Painting By Rubens From 1613 Discovered in Paris Mansion

Ken Griffin Loves Pollock’s Blue Poles So Much He Tried to Buy it

Nan Goldin Says Her Market ‘Tanked’ Due to Palestine Activism

Sally Mann Says Her Black Men Photos Are ‘Problematic’ in Hindsight