R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

arXiv:2505.17005v1 Announce Type: cross
Abstract: Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model’s internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

Source link

What's Hot

Inside the Conference Shaping Frontier AI for Science

Alibaba’s new Qwen3-235B-A22B-2507 beats Kimi-2, Claude Opus

MIT Innovation HQ shuts down after 12 years due to budget cuts

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

3,800-Year-Old Warrior’s Tomb Unearthed in Azerbaijan

Removed Romanesque Murals Must Be Returned to Sijena Monastery

Morning Links for July 22, 2025

Ronald Perelman’s $410 Million Art Insurance Trial Begins over Fire-Damaged Works

Inside the Conference Shaping Frontier AI for Science

Alibaba’s new Qwen3-235B-A22B-2507 beats Kimi-2, Claude Opus

MIT Innovation HQ shuts down after 12 years due to budget cuts

What's Hot

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Related Posts

Subscribe to Updates