Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Ganiga will showcase its waste-sorting robots at TechCrunch Disrupt 2025

A Conversation with Sam and Jony

Is AI Finally Ready to Make a Discovery Worthy of the Nobel Prize?

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

New memory framework builds AI agents that can handle the real world's unpredictability

By Advanced AI EditorOctober 9, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email



Researchers at the University of Illinois Urbana-Champaign and Google Cloud AI Research have developed a framework that enables large language model (LLM) agents to organize their experiences into a memory bank, helping them get better at complex tasks over time.

The framework, called ReasoningBank, distills “generalizable reasoning strategies” from an agent’s successful and failed attempts to solve problems. The agent then uses this memory during inference to avoid repeating past mistakes and make better decisions as it faces new problems. The researchers show that when combined with test-time scaling techniques, where an agent makes multiple attempts at a problem, ReasoningBank significantly improves the performance and efficiency of LLM agents.

Their findings show that ReasoningBank consistently outperforms classic memory mechanisms across web browsing and software engineering benchmarks, offering a practical path toward building more adaptive and reliable AI agents for enterprise applications.

The challenge of LLM agent memory

As LLM agents are deployed in applications that run for long periods, they encounter a continuous stream of tasks. One of the key limitations of current LLM agents is their failure to learn from this accumulated experience. By approaching each task in isolation, they inevitably repeat past mistakes, discard valuable insights from related problems, and fail to develop skills that would make them more capable over time.

The solution to this limitation is to give agents some kind of memory. Previous efforts to give agents memory have focused on storing past interactions for reuse by organizing information in various forms from plain text to structured graphs. However, these approaches often fall short. Many use raw interaction logs or only store successful task examples. This means they can't distill higher-level, transferable reasoning patterns and, crucially, they don’t extract and use the valuable information from the agent’s failures. As the researchers note in their paper, “existing memory designs often remain limited to passive record-keeping rather than providing actionable, generalizable guidance for future decisions.”

How ReasoningBank works

ReasoningBank is a memory framework designed to overcome these limitations. Its central idea is to distill useful strategies and reasoning hints from past experiences into structured memory items that can be stored and reused.

According to Jun Yan, a Research Scientist at Google and co-author of the paper, this marks a fundamental shift in how agents operate. "Traditional agents operate statically—each task is processed in isolation," Yan explained. "ReasoningBank changes this by turning every task experience (successful or failed) into structured, reusable reasoning memory. As a result, the agent doesn’t start from scratch with each customer; it recalls and adapts proven strategies from similar past cases."

The framework processes both successful and failed experiences and turns them into a collection of useful strategies and preventive lessons. The agent judges success and failure through LLM-as-a-judge schemes to obviate the need for human labeling.

Yan provides a practical example of this process in action. An agent tasked with finding Sony headphones might fail because its broad search query returns over 4,000 irrelevant products. "ReasoningBank will first try to figure out why this approach failed," Yan said. "It will then distill strategies such as ‘optimize search query’ and ‘confine products with category filtering.’ Those strategies will be extremely useful to get future similar tasks successfully done."

The process operates in a closed loop. When an agent faces a new task, it uses an embedding-based search to retrieve relevant memories from ReasoningBank to guide its actions. These memories are inserted into the agent’s system prompt, providing context for its decision-making. Once the task is completed, the framework creates new memory items to extract insights from successes and failures. This new knowledge is then analyzed, distilled, and merged into the ReasoningBank, allowing the agent to continuously evolve and improve its capabilities.

Supercharging memory with scaling

The researchers found a powerful synergy between memory and test-time scaling. Classic test-time scaling involves generating multiple independent answers to the same question, but the researchers argue that this “vanilla form is suboptimal because it does not leverage inherent contrastive signal that arises from redundant exploration on the same problem.”

To address this, they propose Memory-aware Test-Time Scaling (MaTTS), which integrates scaling with ReasoningBank. MaTTS comes in two forms. In “parallel scaling,” the system generates multiple trajectories for the same query, then compares and contrasts them to identify consistent reasoning patterns. In sequential scaling, the agent iteratively refines its reasoning within a single attempt, with the intermediate notes and corrections also serving as valuable memory signals.

This creates a virtuous cycle: the existing memory in ReasoningBank steers the agent toward more promising solutions, while the diverse experiences generated through scaling enable the agent to create higher-quality memories to store in ReasoningBank. 

“This positive feedback loop positions memory-driven experience scaling as a new scaling dimension for agents,” the researchers write.

ReasoningBank in action

The researchers tested their framework on WebArena (web browsing) and SWE-Bench-Verified (software engineering) benchmarks, using models like Google’s Gemini 2.5 Pro and Anthropic’s Claude 3.7 Sonnet. They compared ReasoningBank against baselines including memory-free agents and agents using trajectory-based or workflow-based memory frameworks.

The results show that ReasoningBank consistently outperforms these baselines across all datasets and LLM backbones. On WebArena, it improved the overall success rate by up to 8.3 percentage points compared to a memory-free agent. It also generalized better on more difficult, cross-domain tasks, while reducing the number of interaction steps needed to complete tasks. When combined with MaTTS, both parallel and sequential scaling further boosted performance, consistently outperforming standard test-time scaling.

This efficiency gain has a direct impact on operational costs. Yan points to a case where a memory-free agent took eight trial-and-error steps just to find the right product filter on a website. "Those trial and error costs could be avoided by leveraging relevant insights from ReasoningBank," he noted. "In this case, we save almost twice the operational costs," which also improves the user experience by resolving issues faster.

For enterprises, ReasoningBank can help develop cost-effective agents that can learn from experience and adapt over time in complex workflows and areas like software development, customer support, and data analysis. As the paper concludes, “Our findings suggest a practical pathway toward building adaptive and lifelong-learning agents.”

Yan confirmed that their findings point toward a future of truly compositional intelligence. For example, a coding agent could learn discrete skills like API integration and database management from separate tasks. "Over time, these modular skills… become building blocks the agent can flexibly recombine to solve more complex tasks," he said, suggesting a future where agents can autonomously assemble their knowledge to manage entire workflows with minimal human oversight.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleEven after Stargate, Oracle, Nvidia, and AMD, OpenAI has more big deals coming soon, Sam Altman says
Next Article IBM stock surges following Anthropic partnership
Advanced AI Editor
  • Website

Related Posts

Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger — on specific problems

October 9, 2025

To scale agentic AI, Notion tore down its tech stack and started fresh

October 8, 2025

Here’s what Jony Ive and Sam Altman revealed about their secretive AI hardware project at OpenAI’s Dev Day

October 8, 2025

Comments are closed.

Latest Posts

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Underground Film Legend and Artist Dies at 92

Artwork Forfeited by Inigo Philbrick’s Partner Flops at Sotheby’s

Latest Posts

Ganiga will showcase its waste-sorting robots at TechCrunch Disrupt 2025

October 9, 2025

A Conversation with Sam and Jony

October 9, 2025

Is AI Finally Ready to Make a Discovery Worthy of the Nobel Prize?

October 9, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Ganiga will showcase its waste-sorting robots at TechCrunch Disrupt 2025
  • A Conversation with Sam and Jony
  • Is AI Finally Ready to Make a Discovery Worthy of the Nobel Prize?
  • LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation – Takara TLDR
  • Samsung AI researcher's new, open reasoning model TRM outperforms models 10,000X larger — on specific problems

Recent Comments

  1. Bryanassit on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. vykup_qcMr on “Hello World” with OpenAI Codex
  3. Manta Bridge on 13 AI-Focused Storage Offerings On Display At Nvidia GTC 2025
  4. وی ایزوله استروویت on Meta, Booz Allen Launch ‘Space Llama’ AI System For Space Station Operations – Meta Platforms (NASDAQ:META), Booz Allen Hamilton (NYSE:BAH)
  5. GeorgeVic on Bitcoin Security: Here’s What Makes The OG Blockchain Safer Than Fort Knox

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.