AgentFly: Fine-tuning LLM Agents Without Fine-tuning LLMs - Takara TLDR

In this paper, we introduce a novel learning paradigm for adaptive Large
Language Model (LLM) agents that eliminates the need for fine-tuning the
underlying LLMs. Existing approaches are often either rigid, relying on static,
handcrafted reflection workflows, or computationally intensive, requiring
gradient updates of LLM model parameters. In contrast, our method enables
low-cost continual adaptation via memory-based online reinforcement learning.
We formalise this as a Memory-augmented Markov Decision Process (M-MDP),
equipped with a neural case-selection policy to guide action decisions. Past
experiences are stored in an episodic memory, either differentiable or
non-parametric. The policy is continually updated based on environmental
feedback through a memory rewriting mechanism, whereas policy improvement is
achieved through efficient memory reading (retrieval). We instantiate our agent
model in the deep research setting, namely AgentFly, which attains top-1 on
GAIA validation ($87.88\%$ Pass@$3$) and $79.40\%$ on the test set. It reaches
$66.6\%$ F1 and $80.4\%$ PM on the DeepResearcher dataset, outperforming the
state-of-the-art training-based method, while case-based memory adds $4.7\%$ to
$9.6\%$ absolute points on out-of-distribution tasks. Our approach offers a
scalable and efficient pathway for developing generalist LLM agents capable of
continuous, real-time learning without gradient updates, advancing machine
learning towards open-ended skill acquisition and deep research scenarios. The
code is available at https://github.com/Agent-on-the-Fly/AgentFly.

Source link

What's Hot

MIT rejects Trump compact, first to stand up to partisan demands

Ready or not, enterprises are betting on AI

[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs – Takara TLDR

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window – Takara TLDR

First Try Matters: Revisiting the Role of Reflection in Reasoning Models – Takara TLDR

UniVideo: Unified Understanding, Generation, and Editing for Videos – Takara TLDR

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues

MIT rejects Trump compact, first to stand up to partisan demands

Ready or not, enterprises are betting on AI

[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

What's Hot

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs – Takara TLDR

Related Posts

Subscribe to Updates