Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

A New Trick Could Block the Misuse of Open Source AI

Perplexity AI coming soon to these Samsung devices – report

C3 AI Stock Is Soaring Today: Here’s Why – C3.ai (NYSE:AI)

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN
VentureBeat AI

Former DeepSeeker and collaborators release new method for training reliable AI agents: RAGEN

Advanced AI BotBy Advanced AI BotApril 23, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

2025 was, by many expert accounts, supposed to be the year of AI agents — task-specific AI implementations powered by leading large language and multimodal models (LLMs) like the kinds offered by OpenAI, Anthropic, Google, and DeepSeek.

But so far, most AI agents remain stuck as experimental pilots in a kind of corporate purgatory, according to a recent poll conducted by VentureBeat on the social network X.

Help may be on the way: a collaborative team from Northwestern University, Microsoft, Stanford, and the University of Washington — including a former DeepSeek researcher named Zihan Wang, currently completing a computer science PhD at Northwestern — has introduced RAGEN, a new system for training and evaluating AI agents that they hope makes them more reliable and less brittle for real-world, enterprise-grade usage.

Unlike static tasks like math solving or code generation, RAGEN focuses on multi-turn, interactive settings where agents must adapt, remember, and reason in the face of uncertainty.

Built on a custom RL framework called StarPO (State-Thinking-Actions-Reward Policy Optimization), the system explores how LLMs can learn through experience rather than memorization. The focus is on entire decision-making trajectories, not just one-step responses.

StarPO operates in two interleaved phases: a rollout stage where the LLM generates complete interaction sequences guided by reasoning, and an update stage where the model is optimized using normalized cumulative rewards. This structure supports a more stable and interpretable learning loop compared to standard policy optimization approaches.

The authors implemented and tested the framework using fine-tuned variants of Alibaba’s Qwen models, including Qwen 1.5 and Qwen 2.5. These models served as the base LLMs for all experiments and were chosen for their open weights and robust instruction-following capabilities. This decision enabled reproducibility and consistent baseline comparisons across symbolic tasks.

Here’s how they did it and what they found:

The Echo trap: how reinforcement learning rewards lead to LLM reasoning loss

Wang summarized the core challenge in a widely shared X thread: Why does your RL training always collapse?

According to the team, LLM agents initially generate symbolic, well-reasoned responses. But over time, RL systems tend to reward shortcuts, leading to repetitive behaviors that degrade overall performance—a pattern they call the “Echo Trap.”

This regression is driven by feedback loops where certain phrases or strategies earn high rewards early on, encouraging overuse and stifling exploration.

Wang notes that the symptoms are measurable: reward variance cliffs, gradient spikes, and disappearing reasoning traces.

RAGEN test environments aren’t exactly enterprise-grade

To study these behaviors in a controlled setting, RAGEN evaluates agents across three symbolic environments:

Bandit: A single-turn, stochastic task that tests symbolic risk-reward reasoning.

Sokoban: A multi-turn, deterministic puzzle involving irreversible decisions.

Frozen Lake: A stochastic, multi-turn task requiring adaptive planning.

Each environment is designed to minimize real-world priors and focus solely on decision-making strategies developed during training.

In the Bandit environment, for instance, agents are told that Dragon and Phoenix arms represent different reward distributions.

Rather than being told the probabilities directly, they must reason symbolically—e.g., interpreting Dragon as “strength” and Phoenix as “hope”—to predict outcomes. This kind of setup pressures the model to generate explainable, analogical reasoning.

Stabilizing reinforcement learning with StarPO-S

To address training collapse, the researchers introduced StarPO-S, a stabilized version of the original framework. StarPO-S incorporates three key interventions:

Uncertainty-based rollout filtering: Prioritizing rollouts where the agent shows outcome uncertainty.

KL penalty removal: Allowing the model to deviate more freely from its original policy and explore new behaviors.

Asymmetric PPO clipping: Amplifying high-reward trajectories more than low-reward ones to boost learning.

These changes delay or eliminate training collapse and improve performance across all three tasks. As Wang put it: “StarPO-S… works across all 3 tasks. Relieves collapse. Better reward.”

What makes for a good agentic AI model?

The success of RL training hinges not just on architecture, but on the quality of the data generated by the agents themselves. The team identified three dimensions that significantly impact training:

Task diversity: Exposing the model to a wide range of initial scenarios improves generalization.

Interaction granularity: Allowing multiple actions per turn enables more meaningful planning.

Rollout freshness: Keeping training data aligned with the current model policy avoids outdated learning signals.

Together, these factors make the training process more stable and effective.

An interactive demo site published by the researchers on Github makes this explicit, visualizing agent rollouts as full dialogue turns—including not just actions, but the step-by-step thought process that preceded them.

For example, in solving a math problem, an agent may first ‘think’ about isolating a variable, then submit an answer like ‘x = 5’. These intermediate thoughts are visible and traceable, which adds transparency into how agents arrive at decisions.

When reasoning runs out

While explicit reasoning improves performance in simple, single-turn tasks like Bandit, it tends to decay during multi-turn training. Despite the use of structured prompts and  tokens, reasoning traces often shrink or vanish unless directly rewarded.

This points to a limitation in how rewards are typically designed: focusing on task completion may neglect the quality of the process behind it. The team experimented with format-based penalties to encourage better-structured reasoning, but acknowledges that more refined reward shaping is likely needed.

RAGEN, along with its StarPO and StarPO-S frameworks, is now available as an open-source project at https://github.com/RAGEN-AI/RAGEN. However, no explicit license is listed in the GitHub repository at the time of writing, which may limit use or redistribution by others.

The system provides a valuable foundation for those interested in developing AI agents that do more than complete tasks—they think, plan, and evolve.

As AI continues to move toward autonomy, projects like RAGEN help illuminate what it takes to train models that learn not just from data, but from the consequences of their own actions.

Outstanding Questions for Real-World Adoption

While the RAGEN paper offers a detailed technical roadmap, several practical questions remain for those looking to apply these methods in enterprise settings. For example, how transferable is RAGEN’s approach beyond stylized, symbolic tasks? Would businesses need to design entirely new environments and reward functions to use this system in workflows like invoice processing or customer support?

Another critical area is scalability. Even with the enhancements provided by StarPO-S, the paper acknowledges that training still eventually collapses over longer horizons. This raises the question: is there a theoretical or practical path to sustaining reasoning over open-ended or continuously evolving task sequences?

At the time of writing, no explicit license is listed in the RAGEN GitHub repository or documentation, leaving open questions about usage rights.

To explore these and other questions—including how non-technical decision-makers should interpret RAGEN’s implications—I reached out to co-author Wang for further insight. At the time of writing, a response is pending. Should any comments arrive, they will be included in a follow-up to this article or integrated as an update.

RAGEN stands out not just as a technical contribution but as a conceptual step toward more autonomous, reasoning-capable AI agents. Whether it becomes part of the enterprise AI stack remains to be seen, but its insights into agent learning dynamics are already helping redefine the frontier of LLM training.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleOpenAI seeks to make its upcoming open AI model best-in-class
Next Article Billionaire Art Collector Ken Griffin Says US Is “Eroding” Its Brand
Advanced AI Bot
  • Website

Related Posts

Agent-based computing is outgrowing the web as we know it

June 7, 2025

Sam Altman calls for ‘AI privilege’ as OpenAI clarifies court order to retain temporary and deleted ChatGPT sessions

June 6, 2025

Voice AI that actually converts: New TTS model boosts sales 15% for major brands

June 6, 2025
Leave A Reply Cancel Reply

Latest Posts

16 Iconic Wild Animals Photos Celebrating Remembering Wildlife

The Timeless Willie Nelson On Positive Thinking

Jiaxing Train Station By Architect Ma Yansong Is A Model Of People-Centric, Green Urban Design

Midwestern Grotto Tradition Celebrated In Sheboygan, WI

Latest Posts

A New Trick Could Block the Misuse of Open Source AI

June 8, 2025

Perplexity AI coming soon to these Samsung devices – report

June 8, 2025

C3 AI Stock Is Soaring Today: Here’s Why – C3.ai (NYSE:AI)

June 8, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.