Paper Page - QwenLong-L1: Towards Long-Context Large Reasoning Models With Reinforcement Learning

A framework called QwenLong-L1 enhances large reasoning models for long-context reasoning through reinforcement learning, achieving leading performance on document question-answering benchmarks.

Recent large reasoning models (LRMs) have demonstrated strong reasoning
capabilities through reinforcement learning (RL). These improvements have
primarily been observed within the short-context reasoning tasks. In contrast,
extending LRMs to effectively process and reason on long-context inputs via RL
remains a critical unsolved challenge. To bridge this gap, we first formalize
the paradigm of long-context reasoning RL, and identify key challenges in
suboptimal training efficiency and unstable optimization process. To address
these issues, we propose QwenLong-L1, a framework that adapts short-context
LRMs to long-context scenarios via progressive context scaling. Specifically,
we utilize a warm-up supervised fine-tuning (SFT) stage to establish a robust
initial policy, followed by a curriculum-guided phased RL technique to
stabilize the policy evolution, and enhanced with a difficulty-aware
retrospective sampling strategy to incentivize the policy exploration.
Experiments on seven long-context document question-answering benchmarks
demonstrate that QwenLong-L1-32B outperforms flagship LRMs like OpenAI-o3-mini
and Qwen3-235B-A22B, achieving performance on par with
Claude-3.7-Sonnet-Thinking, demonstrating leading performance among
state-of-the-art LRMs. This work advances the development of practical
long-context LRMs capable of robust reasoning across information-intensive
environments.

Source link

What's Hot

The Hybrid AI Law Firm – Artificial Lawyer

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants – Takara TLDR

Anthropic Claude AI Experiences Outage, Developers Reflect on AI Tool Dependency and API Stability_the_again_model

Paper page – QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants – Takara TLDR

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning – Takara TLDR

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search – Takara TLDR

Christie’s Will Auction The First Calculating Machine In History

The Art Market Isn’t Dying. The Way We Write About It Might Be.

Banksy Mural of Judge Beating Protestor Removed by Courts Service

Ralph Rugoff to Leave London’s Hayward Gallery After 20 Years

The Hybrid AI Law Firm – Artificial Lawyer

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants – Takara TLDR

Anthropic Claude AI Experiences Outage, Developers Reflect on AI Tool Dependency and API Stability_the_again_model

What's Hot

Paper page – QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Related Posts

Subscribe to Updates