Paper Page - AdaR1: From Long-CoT To Hybrid-CoT Via Bi-Level Adaptive Reasoning Optimization

Recently, long-thought reasoning models achieve strong performance on complex reasoning tasks, but often incur substantial inference overhead, making efficiency a critical concern. Our empirical analysis reveals that the benefit of using Long-CoT varies across problems: while some problems require elaborate reasoning, others show no improvement, or even degraded accuracy. This motivates adaptive reasoning strategies that tailor reasoning depth to the input. However, prior work primarily reduces redundancy within long reasoning paths, limiting exploration of more efficient strategies beyond the Long-CoT paradigm. To address this, we propose a novel two-stage framework for adaptive and efficient reasoning. First, we construct a hybrid reasoning model by merging long and short CoT models to enable diverse reasoning styles. Second, we apply bi-level preference training to guide the model to select suitable reasoning styles (group-level), and prefer concise and correct reasoning within each style group (instance-level). Experiments demonstrate that our method significantly reduces inference costs compared to other baseline approaches, while maintaining performance. Notably, on five mathematical datasets, the average length of reasoning is reduced by more than 50%, highlighting the potential of adaptive strategies to optimize reasoning efficiency in large language models. Code is coming soon at https://github.com/StarDewXXX/AdaR1

Source link

What's Hot

Reflection AI lands $2B at $8B valuation to expand frontier AI infrastructure and safety research

Here's what's slowing down your AI strategy — and how to fix it

The Grand AGI Delusion

Paper page – AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs – Takara TLDR

PickStyle: Video-to-Video Style Transfer with Context-Style Adapters – Takara TLDR

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment – Takara TLDR

Smithsonian Closes Museums Amid Government Shutdown

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Reflection AI lands $2B at $8B valuation to expand frontier AI infrastructure and safety research

Here's what's slowing down your AI strategy — and how to fix it

The Grand AGI Delusion

What's Hot

Paper page – AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

Related Posts

Subscribe to Updates