Paper page - OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Recent advancements in reasoning-based Large Language Models (LLMs), particularly their potential through test-time scaling, have created significant opportunities for distillation in code generation and critique. However, progress in both areas fundamentally depends on large-scale, high-quality datasets. In this work, we introduce OpenCodeReasoning-II, a dataset consists of 2.5M question-solution-critique triples (approx. 35K unique programming questions), making it nearly twice the size of the previous largest publicly available code reasoning dataset. In this work, we employ a two-stage supervised fine-tuning strategy. The first stage focuses on fine-tuning for code generation, while the second stage involves the joint training of models for both code generation and critique. Our resulting finetuned Qwen2.5-Instruct models achieve performance in code generation that either exceeds or equals the best prior open-weight distilled models. Notably, the integration of our code generation and critique models leads to significant improvements in competitive coding performance. Furthermore, we present an extension of the LiveCodeBench benchmark to specifically support the C++ programming language, thereby facilitating more comprehensive LLM evaluation using this benchmark.

Source link

What's Hot

OpenAI to launch open source Excel and PowerPoint-like tools for ChatGPT users

IBM unveils Agentic AI Innovation Center in Bengaluru office

Interactive data insights drive smarter business decisions

Paper page – OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Paper page – Can Multimodal Foundation Models Understand Schematic Diagrams? An Empirical Study on Information-Seeking QA over Scientific Papers

Paper page – EmbRACE-3K: Embodied Reasoning and Action in Complex Environments

Paper page – LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Justin Sun, Billionaire Banana Buyer, Buys $100 M. of Trump Memecoin

WeTransfer Changes Terms of Service After Criticism on Licensing

Artist is Turning Greyhound Bus into Museum of the Great Migration

The Artists and Art Pros Who Donated to Cuomo and Mamdani’s Campaigns

OpenAI to launch open source Excel and PowerPoint-like tools for ChatGPT users

IBM unveils Agentic AI Innovation Center in Bengaluru office

Interactive data insights drive smarter business decisions

What's Hot

Paper page – OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Related Posts

Subscribe to Updates