Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

New Benchmark for Domestic Image Creation! Volcano Engine Seedream 4.0 Released, Leading a New Trend in Multi-Image Creation with 4K Direct Output

ALSP Lawhive Buys Woodstock As SMB Market Evolves – Artificial Lawyer

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Alibaba Cloud (Qwen)

UAE Releases ‘Fastest Inference Model’ Named Kimi, Based on Alibaba’s Qwen and Utilizing the World’s Largest Chip_Cheng_model_Things

By Advanced AI EditorSeptember 10, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


On September 10, Smart Things reported that this morning, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) in Abu Dhabi, in collaboration with AI startup G42, launched the new low-cost inference model K2 Think. The related paper has been published on the arXiv preprint platform, and the model was open-sourced on Hugging Face and GitHub yesterday afternoon.

K2 Think has 32 billion parameters and is built on Alibaba’s open-source model Qwen 2.5, performing better than flagship inference models from OpenAI and DeepSeek that have 20 times the parameter scale.

In the complex mathematical task benchmark tests, researchers calculated K2 Think’s average scores in AIME24, AIME25, HMMT25, and OMNI-Math-HARD, surpassing many open-source models including GPT-OSS, DeepSeek V3.1, and Qwen3 235B-A22B.

In the technical report, researchers mentioned that K2 Think is backed by six major technological innovations. They enhanced the foundational model’s reasoning capabilities through supervised fine-tuning, improved inference performance using Verifiable Reward Reinforcement Learning (RLVR), employed reasoning time technologyto enhance the model, and implemented two speed optimizations during K2-Think’s deployment, including speculative decodingand Cerebras wafer-scale chips, while training with publicly available open-source datasets.

Notably, researchers deployed K2-Think on the Cerebras wafer-scale chip WSE system, which can deliver about 2000 tokens per second, compared to the nominal 200 tokens per second observed in conventional deployment environments like NVIDIA H100/H200 GPUs, representing a tenfold performance improvement.

K2-Think is supported by two powerful backers: the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), an institution focused on AI research established by the UAE, and G42, a tech group backed by Abu Dhabi, which secured a $1.5 billion investment from Microsoft in 2024 and is constructing AI infrastructure called “Interstellar Gateway” in the UAE, jointly funded by companies like OpenAI and SoftBank.

The model’s weights, training data, deployment code, and testing optimization code have all been open-sourced on Hugging Face and GitHub.

Hugging Face link:

GitHub link:

K2 Think homepage:

Technical report:

https://arxiv.org/abs/2509.07604

1. Mathematical Performance Surpasses OpenAI and DeepSeek’s Open Models, Aiming to Provide Specialized Services for Mathematics and Science

Eric Xing, President and Chief AI Researcher at MBZUAI, revealed in an interview with WIRED that K2 Think was developed using thousands of GPUs, with the final training process involving 200 to 300 chips.

K2 Think is not a complete large language model; it is specifically designed for inference, capable of answering complex questions through simulated reasoning rather than quickly synthesizing information for output. Xing mentioned that they plan to integrate K2 Think into a complete large model in the coming months.

In the field of complex mathematics, K2 Think achieved an average score of 67.99 in four benchmark tests: AIME 2024, AIME 2025, HMMT 2025, and Omni-MATH-HARD, surpassing larger-scale models like DeepSeek V3.1 671B and GPT-OSS 120B.

In terms of programming capabilities, K2-Think scored 63.97 on the open-source code capability benchmark LiveCodeBench, outperforming similarly sized models like GPT-OSS 20B and Qwen3-30B-A3B.

In the SciCode benchmark test, which assesses a large model’s ability to convert complex scientific problems into executable code, K2-Think achieved a score of 39.2, ranking second, just 0.1 points behind the first place model Qwen3 235BA22B.

In terms of scientific reasoning, the model scored 71.08 in the GPQA-Diamond benchmark test, outperforming most open-source models except for OpenReasoning-Nemotron-32B and GPT-OSS 120B.

Hector Liu, Director of the Basic Model Research Institute at MBZUAI, noted that K2-Think’s uniqueness lies in their viewing it as a system; their goal is not to build a chatbot similar to ChatGPT but to provide services for specific applications in fields like mathematics and science.

2. Six System-Level Innovations, Entire Training Process Using Open-Source Datasets

The technical report for K2-Think indicates six major technological innovations, including thinking chain supervised fine-tuning, verifiable reward reinforcement learning (RLVR), agent planning before reasoning, testing time expansion, speculative decoding, and reasoning-optimized hardware, all trained using publicly available open-source datasets.

Based on this systematic technological innovation, K2-Think enhanced logical depth through long-chain thinking supervised fine-tuning, improved accuracy in solving difficult problems through verifiable reward reinforcement learning, enabled the model to decompose complex challenges before reasoning through agent-style planning, and further enhanced the model’s adaptability through testing time expansion technology, ultimately achieving performance comparable to models with larger parameter scales. This enables the model to provide powerful thinking chain reasoning capabilities and near-instantaneous response times.

During the supervised fine-tuningphase, K2-Think used thinking chains to supervise fine-tune the foundational model. Researchers utilized the existing AM-Thinking-v1-Distilled dataset, which consists of CoT reasoning traces and instruction/response pairs, with prompts from tasks including mathematical reasoning, code generation, scientific reasoning, instruction following, and general chatting. They found that the SFT model outperformed the foundational model across various sampling budgets.

In the verifiable reward reinforcement learning phase, RLVR directly optimizes the correctness of the model’s outputs, reducing the complexity and cost associated with human feedback reinforcement learning (RLHF) based on preference alignment. Therefore, researchers used the Guru dataset, which includes tasks from six domains: mathematics, programming, science, logic, simulation, and tables, with nearly 92,000 verifiable questions.

In the testing time improvement phase, to further enhance model performance, researchers developed a testing time framework that provides structured input for the post-training reasoning model, including agent planning before reasoning, or “plan first, think later,” as well as testing time expansion using Best-of-N sampling.

▲ Information flow from input to final response

From input to final response, the model reconstructs prompts to outline the overall plan and highlight relevant concepts. This enhanced prompt is then used by the K2-Think model to generate multiple responses, which are finally compared pairwise to select the best generated result as the final output of the reasoning system.

The fourth phase is deployment. In challenging mathematical proofs or multi-step coding problems, a typical complex reasoning task usually generates a response of 32,000 tokens. On NVIDIA H100, this can be completed in less than 3 minutes, while on WSE, the same 32,000 token generation task only takes 16 seconds.

This is because GPUs must continuously transfer weights from high-bandwidth memory to GPU cores with each token generation, whereas WSE stores all model weights in massive chip memory, fully utilizing an on-chip memory bandwidth of 25PB per second, over 3000 times higher than the 0.008PB/s provided by the latest NVIDIA B200 GPUs.

Conclusion: Small Parameter Models Can Match Larger Parameter Models After Fine-Tuning

The performance of the K2-Think model demonstrates that a model with 32 billion parameters, after fine-tuning, can generate longer reasoning chains and, with relatively less testing time, can achieve capabilities comparable to models with significantly larger parameter counts.

Richard Morton, General Manager of the Basic Model Research Institute at MBZUAI, believes that the fundamental reasoning of the human brain is the basis of all thought processes. The application of K2-Think can shorten the time researchers take to think through specific tasks and conduct clinical trials, thereby expanding advanced AI technology to regions where AI infrastructure is scarce.返回搜狐,查看更多



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleDutch chipmaker is investing $1.5B in French AI firm
Next Article What’s Going On With ASML Holding Stock Tuesday? – ASML Holding (NASDAQ:ASML)
Advanced AI Editor
  • Website

Related Posts

The Fastest Inference Model Built on Qwen Using Cerebras Chips_model_the_This

September 10, 2025

Alibaba Hong Kong Shares Rise As 1-Trillion-Parameter Qwen-3-Max AI Model Debuts—To Challenge OpenAI, Google – Alibaba Gr Hldgs (NYSE:BABA)

September 9, 2025

Alibaba Stock Climbs Over 3% In Monday Pre-Market: What’s Going On? – Alibaba Gr Hldgs (NYSE:BABA), Alphabet (NASDAQ:GOOG)

September 8, 2025

Comments are closed.

Latest Posts

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

Latest Posts

New Benchmark for Domestic Image Creation! Volcano Engine Seedream 4.0 Released, Leading a New Trend in Multi-Image Creation with 4K Direct Output

September 10, 2025

ALSP Lawhive Buys Woodstock As SMB Market Evolves – Artificial Lawyer

September 10, 2025

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions – Takara TLDR

September 10, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • New Benchmark for Domestic Image Creation! Volcano Engine Seedream 4.0 Released, Leading a New Trend in Multi-Image Creation with 4K Direct Output
  • ALSP Lawhive Buys Woodstock As SMB Market Evolves – Artificial Lawyer
  • F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions – Takara TLDR
  • The Fastest Inference Model Built on Qwen Using Cerebras Chips_model_the_This
  • OpenAI installs parental controls following California teen’s death

Recent Comments

  1. 你爸爸的鸡巴断了,你倒霉的阴部,你爸爸的网络钓鱼,你妈妈的内脏 on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. RichardDusty on Trump’s Tech Sanctions To Empower China, Betray America
  3. RichardDusty on TEFAF New York Illuminates Art Week With Mastery Of Vivid, Radiant Color
  4. rukumMup on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. RichardDusty on Jony Ive is building a futuristic AI device and OpenAI may acquire it

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.