Paper Page - One Missing Piece For Open-Source Reasoning Models: A Dataset To Mitigate Cold-Starting Short CoT LLMs In RL

The Long CoT Collection dataset, generated by short CoT LLMs, enhances general reasoning skills and provides a strong foundation for reinforcement learning, achieving quality comparable to R1.

With the release of R1, a publicly available large reasoning model (LRM),
researchers commonly train new LRMs by training language models on R1’s long
chain-of-thought (CoT) inferences. While prior works show that LRMs’
capabilities can be reproduced through direct distillation, the continued
reliance on the existing models (e.g., R1) remains a critical limitation in
advancing the field. As a first step toward independent LRM development, this
paper explores the possibility of constructing a long CoT dataset with LLMs
that are not trained for inference-time scaling. To this end, we present the
Long CoT Collection, a dataset of 100K CoT rationales annotated using existing
short CoT LLMs. We develop a pipeline that induces o1’s novel reasoning
strategies into short CoT LLMs, enabling them to think longer and introducing
controllability over the thought budget to better manage the overthinking
problem. Our extensive analyses validate that our dataset achieves quality
comparable to–or slightly below–R1. Furthermore, our experiments demonstrate
that training on our dataset not only strengthens general reasoning skills, but
also provides a strong foundation for reinforcement learning–models
initialized on our data achieve 2-3x larger gains with RLVR.

Source link

What's Hot

TII Falcon-H1 models now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

RenderFormer: How neural networks are reshaping 3D rendering

RSS co-creator launches new protocol for AI data licensing

Paper page – One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL

Reconstruction Alignment Improves Unified Multimodal Models – Takara TLDR

Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding – Takara TLDR

UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward – Takara TLDR

Growing Support for Parthenon Marbles’ Return to Greece, More Art News

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

TII Falcon-H1 models now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

RenderFormer: How neural networks are reshaping 3D rendering

RSS co-creator launches new protocol for AI data licensing

What's Hot

Paper page – One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL

Related Posts

Subscribe to Updates