arXiv AI

Scaling Deep Research via Reinforcement Learning in Real-world Environments

By Advanced AI EditorApril 16, 2025No Comments2 Mins Read

[Submitted on 4 Apr 2025 (v1), last revised 15 Apr 2025 (this version, v3)]

View a PDF of the paper titled DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments, by Yuxiang Zheng and 6 other authors

View PDF
HTML (experimental)

Abstract:Large Language Models (LLMs) equipped with web search capabilities have demonstrated impressive potential for deep research tasks. However, current approaches predominantly rely on either manually engineered prompts (prompt engineering-based) with brittle performance or reinforcement learning within controlled Retrieval-Augmented Generation (RAG) environments (RAG-based) that fail to capture the complexities of real-world interaction. In this paper, we introduce DeepResearcher, the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Unlike RAG-based approaches that assume all necessary information exists within a fixed corpus, our method trains agents to navigate the noisy, unstructured, and dynamic nature of the open web. We implement a specialized multi-agent architecture where browsing agents extract relevant information from various webpage structures and overcoming significant technical challenges. Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications. We release DeepResearcher at this https URL.

Submission history

From: Yuxiang Zheng [view email]
[v1]
Fri, 4 Apr 2025 04:41:28 UTC (959 KB)
[v2]
Mon, 7 Apr 2025 10:45:47 UTC (958 KB)
[v3]
Tue, 15 Apr 2025 02:57:20 UTC (959 KB)

Previous ArticleStanford HAI’s annual report highlights rapid adoption and growing accessibility of powerful AI systems

Next Article Why the Hell Is OpenAI Building an X Clone?

Advanced AI Editor

Leave A Reply