Paper page - Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Ring-lite uses a MoE architecture and reinforcement learning to efficiently match SOTA reasoning models while activating fewer parameters and addressing challenges specific to MoE training.

We present Ring-lite, a Mixture-of-Experts (MoE)-based large language model
optimized via reinforcement learning (RL) to achieve efficient and robust
reasoning capabilities. Built upon the publicly available Ling-lite model, a
16.8 billion parameter model with 2.75 billion activated parameters, our
approach matches the performance of state-of-the-art (SOTA) small-scale
reasoning models on challenging benchmarks (e.g., AIME, LiveCodeBench,
GPQA-Diamond) while activating only one-third of the parameters required by
comparable models. To accomplish this, we introduce a joint training pipeline
integrating distillation with RL, revealing undocumented challenges in MoE RL
training. First, we identify optimization instability during RL training, and
we propose Constrained Contextual Computation Policy Optimization(C3PO), a
novel approach that enhances training stability and improves computational
throughput via algorithm-system co-design methodology. Second, we empirically
demonstrate that selecting distillation checkpoints based on entropy loss for
RL training, rather than validation metrics, yields superior
performance-efficiency trade-offs in subsequent RL training. Finally, we
develop a two-stage training paradigm to harmonize multi-domain data
integration, addressing domain conflicts that arise in training with mixed
dataset. We will release the model, dataset, and code.

Source link

What's Hot

SportsVisio raises $3.2M for AI for sports athletes and fans

Video Game Graphics To Reality And Back | Two Minute Papers #203

What is Intelligence? – François Chollet and Lex Fridman | AI Podcast Clips

Paper page – Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Paper page – CAMS: A CityGPT-Powered Agentic Framework for Urban Human Mobility Simulation

Paper page – Optimizing Length Compression in Large Reasoning Models

Paper page – Ambient Diffusion Omni: Training Good Models with Bad Data

Israeli Attacks on Palestinian Heritage Constitute War Crimes: Report

UOVO to Expand Facilities in Brooklyn

Former Sotheby’s Vet Launches Art Lending Firm with Nahmads’ Backing

Orange County Museum of Art Discusses Merger with UC Irvine

SportsVisio raises $3.2M for AI for sports athletes and fans

Video Game Graphics To Reality And Back | Two Minute Papers #203

What is Intelligence? – François Chollet and Lex Fridman | AI Podcast Clips

What's Hot

Paper page – Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs

Related Posts

Subscribe to Updates