Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

How INRIX accelerates transportation planning with Amazon Bedrock

Robinhood’s OpenAI and SpaceX Tokens Under EU Investigation

Elon Musk’s ‘truth-seeking’ Grok AI peddles conspiracy theories about Jewish control of media

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Amazon (Titan)
    • Anthropic (Claude 3)
    • Cohere (Command R)
    • Google DeepMind (Gemini)
    • IBM (Watsonx)
    • Inflection AI (Pi)
    • Meta (LLaMA)
    • OpenAI (GPT-4 / GPT-4o)
    • Reka AI
    • xAI (Grok)
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Facebook X (Twitter) Instagram
Advanced AI News
VentureBeat AI

Patronus AI debuts Percival to help enterprises monitor failing AI agents at scale

Advanced AI EditorBy Advanced AI EditorMay 14, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Patronus AI launched a new monitoring platform today that automatically identifies failures in AI agent systems, targeting enterprise concerns about reliability as these applications grow more complex.

The San Francisco-based AI safety startup’s new product, Percival, positions itself as the first solution capable of automatically identifying various failure patterns in AI agent systems and suggesting optimizations to address them.

“Percival is the industry’s first solution that automatically detects a variety of failure patterns in agentic systems and then systematically suggests fixes and optimizations to address them,” said Anand Kannappan, CEO and co-founder of Patronus AI, in an exclusive interview with VentureBeat.

AI agent reliability crisis: Why companies are losing control of autonomous systems

Enterprise adoption of AI agents—software that can independently plan and execute complex multi-step tasks—has accelerated in recent months, creating new management challenges as companies try to ensure these systems operate reliably at scale.

Unlike conventional machine learning models, these agent-based systems often involve lengthy sequences of operations where errors in early stages can have significant downstream consequences.

“A few weeks ago, we published a model that quantifies how likely agents can fail, and what kind of impact that might have on the brand, on customer churn and things like that,” Kannappan said. “There’s a constant compounding error probability with agents that we’re seeing.”

This issue becomes particularly acute in multi-agent environments where different AI systems interact with one another, making traditional testing approaches increasingly inadequate.

Episodic memory innovation: How Percival’s AI agent architecture revolutionizes error detection

Percival differentiates itself from other evaluation tools through its agent-based architecture and what the company calls “episodic memory” — the ability to learn from previous errors and adapt to specific workflows.

The software can detect more than 20 different failure modes across four categories: reasoning errors, system execution errors, planning and coordination errors, and domain-specific errors.

“Unlike an LLM as a judge, Percival itself is an agent and so it can keep track of all the events that have happened throughout the trajectory,” explained Darshan Deshpande, a researcher at Patronus AI. “It can correlate them and find these errors across contexts.”

For enterprises, the most immediate benefit appears to be reduced debugging time. According to Patronus, early customers have reduced the time spent analyzing agent workflows from about one hour to between one and 1.5 minutes.

TRAIL benchmark reveals critical gaps in AI oversight capabilities

Alongside the product launch, Patronus is releasing a benchmark called TRAIL (Trace Reasoning and Agentic Issue Localization) to evaluate how well systems can detect issues in AI agent workflows.

Research using this benchmark revealed that even sophisticated AI models struggle with effective trace analysis, with the best-performing system scoring only 11% on the benchmark.

The findings underscore the challenging nature of monitoring complex AI systems and may help explain why large enterprises are investing in specialized tools for AI oversight.

Enterprise AI leaders embrace Percival for mission-critical agent applications

Early adopters include Emergence AI, which has raised approximately $100 million in funding and is developing systems where AI agents can create and manage other agents.

“Emergence’s recent breakthrough—agents creating agents—marks a pivotal moment not only in the evolution of adaptive, self-generating systems, but also in how such systems are governed and scaled responsibly,” said Satya Nitta, co-founder and CEO of Emergence AI, in a statement sent to VentureBeat.

Nova, another early customer, is using the technology for a platform that helps large enterprises migrate legacy code through AI-powered SAP integrations.

These customers typify the challenge Percival aims to solve. According to Kannappan, some companies are now managing agent systems with “more than 100 steps in a single agent directory,” creating complexity that far exceeds what human operators can efficiently monitor.

AI oversight market poised for explosive growth as autonomous systems proliferate

The launch comes amid rising enterprise concerns about AI reliability and governance. As companies deploy increasingly autonomous systems, the need for oversight tools has grown proportionally.

“What’s challenging is that systems are becoming increasingly autonomous,” Kannappan noted, adding that “billions of lines of code are being generated per day using AI,” creating an environment where manual oversight becomes practically impossible.

The market for AI monitoring and reliability tools is expected to expand significantly as enterprises move from experimental deployments to mission-critical AI applications.

Percival integrates with multiple AI frameworks, including Hugging Face Smolagents, Pydantic AI, OpenAI Agent SDK, and Langchain, making it compatible with various development environments.

While Patronus AI did not disclose pricing or revenue projections, the company’s focus on enterprise-grade oversight suggests it is positioning itself for the high-margin enterprise AI safety market that analysts predict will grow substantially as AI adoption accelerates.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleWill AI Search Engines Cripple Google’s Dominance?
Next Article In race to build Google Chrome rival, why Perplexity’s fresh funding is crucial
Advanced AI Editor
  • Website

Related Posts

Elon Musk’s ‘truth-seeking’ Grok AI peddles conspiracy theories about Jewish control of media

July 7, 2025

How Capital One built production multi-agent AI workflows to power enterprise use cases

July 7, 2025

Cracking AI’s storage bottleneck and supercharging inference at the edge

July 7, 2025
Leave A Reply Cancel Reply

Latest Posts

Confederate Group Sues Stone Mountain Over Show on Racism and Slavery

UK MPs to Debate Banning Advertising by Oil Companies

Albright College is Selling Its Art Collection to Balance Its Books

Big Three Auction Houses Hold Old Masters Sales in London This Week

Latest Posts

How INRIX accelerates transportation planning with Amazon Bedrock

July 7, 2025

Robinhood’s OpenAI and SpaceX Tokens Under EU Investigation

July 7, 2025

Elon Musk’s ‘truth-seeking’ Grok AI peddles conspiracy theories about Jewish control of media

July 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • How INRIX accelerates transportation planning with Amazon Bedrock
  • Robinhood’s OpenAI and SpaceX Tokens Under EU Investigation
  • Elon Musk’s ‘truth-seeking’ Grok AI peddles conspiracy theories about Jewish control of media
  • ChatGPT is testing a mysterious new feature called ‘study together’
  • Building Bridges With Flying Machines | Two Minute Papers #11

Recent Comments

No comments to show.

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.