Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

TwelveLabs CEO Jae Lee is coming to TechCrunch Sessions: AI

No-code data preparation for time series forecasting using Amazon SageMaker Canvas

DeepSeek Supported China’s Military and Bypassed Export Controls, Says US Official

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Facebook X (Twitter) Instagram
Advanced AI News
Home » Agentica Project’s Open Source DeepCoder Model Outperforms OpenAI’s O1 on Coding Benchmarks
Alibaba Cloud (Qwen)

Agentica Project’s Open Source DeepCoder Model Outperforms OpenAI’s O1 on Coding Benchmarks

Advanced AI EditorBy Advanced AI EditorJune 23, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


The Agentica Project and Together AI have released DeepCoder-14B-Preview, an open source AI coding model based on Deepseek-R1-Distilled-Qwen-14B. The model achieves a 60.6% pass rate on LiveCodeBench, outperforming OpenAI’s o1 model and matching the performance of o3-mini.

DeepCoder-14B-Preview is fine-tuned from the Deepseek model on a dataset of 24K coding problems using reinforcement learning (RL). The developers modified the verl distributed RL framework to improve the end-to-end training efficiency by 2x. They released all artifacts associated with creating the model: code, data, training logs, and their improvements to verl. They evaluated the model on several coding benchmarks, including LiveCodeBench, Codeforces, and HumanEval, and on the math benchmark AIME2024. DeepCoder showed strong performance on all of them, with scores “comparable” to or even better than closed source reasoning models such as o1 and o3-mini. According to the project team,

Our goal is to democratize RL training for LLMs…By fully sharing our dataset, code, and training recipe, we empower the community to reproduce our work and make RL training accessible to all. We believe advancing RL scaling is a collective, community-driven endeavor, and we welcome open-source contributions and sponsorships. Let’s work together to push the frontiers of RL for LLM reasoning—and beyond!

The DeepCoder team published several details about their training process and several problems they overcame. First was a lack of  “high-quality, verifiable” training data for coding problems: several popular datasets were “noisy or contained unverifiable problems,” or were just too easy for models to solve. To create a training dataset, the team developed an automated pipeline to keep only problems with a verifiable solution and at least five unit tests.

They also addressed an RL training bottleneck in “sampling,” i.e. running inference on the model being trained. The solution was to pipeline the process: run training and inference in parallel, and use the inference output for the next batch of training. This reduced the training iteration time by 1.4x.

Coding Model Performance vs Model Parameters

LiveCodeBench Pass@1 Accuracy vs Model Size. Image Source: Together AI Blog

In a Reddit discussion about the model, one user wrote:

I just gave the q4 quant of the 14b version on ollama a try and I have to say that I’m very impressed. It’s definitely the best model I’ve tried in this size. I’d need more testing to conclude if it’s really as good as o3-mini low (particularly as I only have ever tested o3-mini medium), but it definitely feels like it’s beyond 4o in my initial testing on my day-to-day tasks.

Andrew Ng’s newsletter The Batch praised DeepCoder, saying:

Applying reinforcement learning to coding works, but it has two big issues: (i) Training examples of verifiable code are relatively scarce and (ii) computing reward signals for code is time-consuming, since it requires evaluating many test cases. DeepCoder-14B-Preview’s optimizations reduced this complexity, shrinking RL training from months to weeks. Those optimizations are built into Verl-pipeline, an open source RL library from Together.AI and Agentica, giving developers a powerful tool for model training.


Kudos to the DeepCoder team for open sourcing their reasoning recipe! A handful of companies have developed the know-how to execute RL well, but many teams still have trouble implementing successfully. Open recipes for RL training methods and data curation techniques are important to move the field forward.

The DeepCoder-14B-Preview training code is available on GitHub. Model files can be downloaded from Huggingface.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleMeta’s plan to poach OpenAI employees with $100 million packages
Next Article DeepSeek aids China’s military and evaded export controls, says US – Technology News
Advanced AI Editor
  • Website

Related Posts

Alibaba’s High-Speed Comeback: The AI-Commerce Cocktail Lifting ETF Hopes Alibaba’s High-Speed Comeback: The AI-Commerce Cocktail Lifting ETF Hopes – Global X Artificial Intelligence & Technology ETF (NASDAQ:AIQ), Amazon.com (NASDAQ:AMZN)

June 22, 2025

Alibaba’s High-Speed Comeback: The AI-Commerce Cocktail Lifting ETF Hopes Alibaba’s High-Speed Comeback: The AI-Commerce Cocktail Lifting ETF Hopes – Global X Artificial Intelligence & Technology ETF (NASDAQ:AIQ), Amazon.com (NASDAQ:AMZN)

June 22, 2025

Alibaba’s High-Speed Comeback: The AI-Commerce Cocktail Lifting ETF Hopes Alibaba’s High-Speed Comeback: The AI-Commerce Cocktail Lifting ETF Hopes – Global X Artificial Intelligence & Technology ETF (NASDAQ:AIQ), Amazon.com (NASDAQ:AMZN)

June 22, 2025
Leave A Reply Cancel Reply

Latest Posts

Chanel Launches Arts & Culture Magazine

Publicity Wizard Jalila Singerff On The Vital PR Rules For 2025

Romania Wins ‘Hold’ on El Greco, Arnaldo Pomodoro Dead, and More

Empire Of The Sun’s Luke Steele On Loss, Grief, Al Green And More

Latest Posts

TwelveLabs CEO Jae Lee is coming to TechCrunch Sessions: AI

June 23, 2025

No-code data preparation for time series forecasting using Amazon SageMaker Canvas

June 23, 2025

DeepSeek Supported China’s Military and Bypassed Export Controls, Says US Official

June 23, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.