Agentica Project's Open Source DeepCoder Model Outperforms OpenAI's O1 On Coding Benchmarks

The Agentica Project and Together AI have released DeepCoder-14B-Preview, an open source AI coding model based on Deepseek-R1-Distilled-Qwen-14B. The model achieves a 60.6% pass rate on LiveCodeBench, outperforming OpenAI’s o1 model and matching the performance of o3-mini.

DeepCoder-14B-Preview is fine-tuned from the Deepseek model on a dataset of 24K coding problems using reinforcement learning (RL). The developers modified the verl distributed RL framework to improve the end-to-end training efficiency by 2x. They released all artifacts associated with creating the model: code, data, training logs, and their improvements to verl. They evaluated the model on several coding benchmarks, including LiveCodeBench, Codeforces, and HumanEval, and on the math benchmark AIME2024. DeepCoder showed strong performance on all of them, with scores “comparable” to or even better than closed source reasoning models such as o1 and o3-mini. According to the project team,

Our goal is to democratize RL training for LLMs…By fully sharing our dataset, code, and training recipe, we empower the community to reproduce our work and make RL training accessible to all. We believe advancing RL scaling is a collective, community-driven endeavor, and we welcome open-source contributions and sponsorships. Let’s work together to push the frontiers of RL for LLM reasoning—and beyond!

The DeepCoder team published several details about their training process and several problems they overcame. First was a lack of “high-quality, verifiable” training data for coding problems: several popular datasets were “noisy or contained unverifiable problems,” or were just too easy for models to solve. To create a training dataset, the team developed an automated pipeline to keep only problems with a verifiable solution and at least five unit tests.

They also addressed an RL training bottleneck in “sampling,” i.e. running inference on the model being trained. The solution was to pipeline the process: run training and inference in parallel, and use the inference output for the next batch of training. This reduced the training iteration time by 1.4x.

LiveCodeBench Pass@1 Accuracy vs Model Size. Image Source: Together AI Blog

In a Reddit discussion about the model, one user wrote:

I just gave the q4 quant of the 14b version on ollama a try and I have to say that I’m very impressed. It’s definitely the best model I’ve tried in this size. I’d need more testing to conclude if it’s really as good as o3-mini low (particularly as I only have ever tested o3-mini medium), but it definitely feels like it’s beyond 4o in my initial testing on my day-to-day tasks.

Andrew Ng’s newsletter The Batch praised DeepCoder, saying:

Applying reinforcement learning to coding works, but it has two big issues: (i) Training examples of verifiable code are relatively scarce and (ii) computing reward signals for code is time-consuming, since it requires evaluating many test cases. DeepCoder-14B-Preview’s optimizations reduced this complexity, shrinking RL training from months to weeks. Those optimizations are built into Verl-pipeline, an open source RL library from Together.AI and Agentica, giving developers a powerful tool for model training.

Kudos to the DeepCoder team for open sourcing their reasoning recipe! A handful of companies have developed the know-how to execute RL well, but many teams still have trouble implementing successfully. Open recipes for RL training methods and data curation techniques are important to move the field forward.

The DeepCoder-14B-Preview training code is available on GitHub. Model files can be downloaded from Huggingface.

Source link

What's Hot

New requirements for apps available in Texas – Latest News

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

Agentica Project’s Open Source DeepCoder Model Outperforms OpenAI’s O1 on Coding Benchmarks

Alibaba’s Secret Robotics Team Signals Next Big Leap in AI

Alibaba’s Qwen Technology Lead Sets Up In-House Robot AI Team

Thinking Machines debuts Tinker, a developer tool to simplify fine-tuning of AI models | Technology News

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Underground Film Legend and Artist Dies at 92

Artwork Forfeited by Inigo Philbrick’s Partner Flops at Sotheby’s

New requirements for apps available in Texas – Latest News

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

What's Hot

Agentica Project’s Open Source DeepCoder Model Outperforms OpenAI’s O1 on Coding Benchmarks

Related Posts

Subscribe to Updates