Loong: Synthesize Long Chain-of-Thoughts At Scale Through Verifiers - Takara TLDR

Recent advances in Large Language Models (LLMs) have shown that their
reasoning capabilities can be significantly improved through Reinforcement
Learning with Verifiable Reward (RLVR), particularly in domains like
mathematics and programming, where ground-truth correctness can be
automatically evaluated. However, extending this success to other
reasoning-intensive domains remains challenging due to the scarcity of
high-quality, verifiable datasets and the high cost of human supervision. In
this work, we introduce the Loong Project: an open-source framework for
scalable synthetic data generation and verification across a diverse range of
reasoning-intensive domains. The framework consists of two key components: (1)
LoongBench, a curated seed dataset containing 8,729 human-vetted examples
across 12 domains (e.g., Advanced Mathematics, Chemistry, Logic), each paired
with executable code and rich metadata; and (2) LoongEnv, a modular synthetic
data generation environment that supports multiple prompting strategies to
produce new question-answer-code triples. Together, these components form an
agent-environment loop that enables reinforcement learning, where an LLM-based
agent is rewarded for generating Chain-of-Thought (CoT) solutions that align
with code-executed answers. Empirically, we benchmark LoongBench on a broad
suite of both open-source and proprietary LLMs to evaluate domain coverage and
reveal performance bottlenecks. In addition, we conduct a comprehensive
analysis of synthetic data generated by LoongEnv, examining correctness,
difficulty, and diversity. Code and documentation are available at
https://github.com/camel-ai/loong.

Source link

What's Hot

Shadow AI enters workforce, employees embrace AI adoption: IBM

OpenAI hires the team behind Xcode coding assistant Alex

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding – Takara TLDR

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers – Takara TLDR

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding – Takara TLDR

Delta Activations: A Representation for Finetuned Large Language Models – Takara TLDR

Towards a Unified View of Large Language Model Post-Training – Takara TLDR

Tony Shafrazi and the Art of the Comeback

Basquiats Linked to 1MDB Scandal Auctioned by US Government

US Ambassador to UK Fills Residence with Impressionist Masters

New Code of Ethics Implores UK Museums to End Fossil Fuel Sponsorships

Shadow AI enters workforce, employees embrace AI adoption: IBM

OpenAI hires the team behind Xcode coding assistant Alex

Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding – Takara TLDR

What's Hot

Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers – Takara TLDR

Related Posts

Subscribe to Updates