Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

American Airlines Refuses to Exploit Consumers, AI Will Transform Service, Not Skew Fares: You Need to Know

Nvidia to Resume H20 AI Chip Sales to China After High-Stakes U.S. Policy Reversal

Paper page – Hierarchical Budget Policy Optimization for Adaptive Reasoning

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

New approach to agent reliability, AgentSpec, forces agents to follow rules

By Advanced AI EditorMarch 30, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

AI agents have safety and reliability problems. Although agents would allow enterprises to automate more steps in their workflows, they can take unintended actions while executing a task, are not very flexible and are difficult to control.

Organizations have already raised the alarm about unreliable agents, worried that once deployed, agents might forget to follow instructions. 

OpenAI even admitted that ensuring agent reliability would involve working with outside developers, so it opened up its Agents SDK to help solve this issue. 

However, Singapore Management University (SMU) researchers have developed a new approach to solving agent reliability.

AgentSpec is a domain-specific framework that lets users “define structured rules that incorporate triggers, predicates and enforcement mechanisms.” The researchers said AgentSpec will make agents work only within the parameters that users want.

Guiding LLM-based agents with a new approach

AgentSpec is not a new large language model (LLM) but rather an approach to guide LLM-based AI agents. The researchers believe AgentSpec can be used for agents in enterprise settings and self-driving applications.   

The first AgentSpec tests integrated on LangChain frameworks, but the researchers said they designed it to be framework-agnostic, meaning it can also run on AutoGen and Apollo ecosystems. 

Experiments using AgentSpec showed it prevented “over 90% of unsafe code executions, ensures full compliance in autonomous driving law-violation scenarios, eliminates hazardous actions in embodied agent tasks and operates with millisecond-level overhead.” LLM-generated AgentSpec rules, which used OpenAI’s o1, also had a strong performance and enforced 87% of risky code and prevented “law-breaking in 5 out of 8 scenarios.”

Current methods are a little lacking

AgentSpec is not the only method for helping developers give agents more control and reliability. Other approaches include ToolEmu and GuardAgent. The startup Galileo launched Agentic Evaluations, a way to ensure agents work as intended.

The open-source platform H2O.ai uses predictive models to improve the accuracy of agents used by companies in finance, healthcare, telecommunications and government. 

The AgentSpec said researchers said current approaches to mitigate risks, like ToolEmu, effectively identify risks. They noted that “these methods lack interpretability and offer no mechanism for safety enforcement, making them susceptible to adversarial manipulation.” 

Using AgentSpec

AgentSpec works as a runtime enforcement layer for agents. It intercepts the agent’s behavior while executing tasks and adds safety rules set by humans or generated by prompts.

Since AgentSpec is a custom domain-specific language, users must define the safety rules. There are three components to this: the first is the trigger, which lays out when to activate the rule; the second is to check to add conditions; and the third is enforce, which enforces actions to take if the rule is violated. 

AgentSpec is built on LangChain, though, as previously stated, the researchers said AgentSpec can also be integrated into other frameworks like AutoGen or the autonomous vehicle software stack Apollo. 

These frameworks orchestrate the steps agents need to take by taking in the user input, creating an execution plan, observing the result, and then deciding if the action was completed and, if not, planning the next step. AgentSpec adds rule enforcement into this flow. 

“Before an action is executed, AgentSpec evaluates predefined constraints to ensure compliance, modifying the agent’s behavior when necessary. Specifically, AgentSpec hooks into three key decision points: before an action is executed (AgentAction), after an action produces an observation (AgentStep), and when the agent completes its task (AgentFinish). These points provide a structured way to intervene without altering the core logic of the agent,” the paper states. 

More reliable agents

Approaches like AgentSpec underscore the need for reliable agents for enterprise use. As organizations begin to plan their agentic strategy, tech decision leaders also look at ways to ensure reliability. 

For many, agents will eventually autonomously and proactively do tasks for users. The idea of ambient agents, where AI agents and apps continuously run in the background and trigger themselves to execute actions, would require agents that do not stray from their path and accidentally introduce non-safe actions. 

If ambient agents are where agentic AI will go in the future, expect more methods like AgentSpec to proliferate as companies seek to make AI agents continuously reliable. 

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleOpenAI peels back ChatGPT’s safeguards around image creation
Next Article Robot Massages, Renting With Pets And Diagnosing Strokes
Advanced AI Editor
  • Website

Related Posts

Anthropic unveils ‘auditing agents’ to test for AI misalignment

July 25, 2025

Freed says 20K clinicians use its AI scribe, but competition looms

July 24, 2025

SecurityPal uses AI, experts in Nepal to answer security qs faster

July 24, 2025
Leave A Reply

Latest Posts

Artist Loses Final Appeal in Case of Apologising for ‘Fishrot Scandal’

US Appeals Court Overturns $8.8 M. Trademark Judgement For Yuga Labs

Old Masters ‘Making a Comeback’ in London: Morning Links

Bill Proposed To Apply Anti-Money Laundering Regulations to Art Market

Latest Posts

American Airlines Refuses to Exploit Consumers, AI Will Transform Service, Not Skew Fares: You Need to Know

July 25, 2025

Nvidia to Resume H20 AI Chip Sales to China After High-Stakes U.S. Policy Reversal

July 25, 2025

Paper page – Hierarchical Budget Policy Optimization for Adaptive Reasoning

July 25, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • American Airlines Refuses to Exploit Consumers, AI Will Transform Service, Not Skew Fares: You Need to Know
  • Nvidia to Resume H20 AI Chip Sales to China After High-Stakes U.S. Policy Reversal
  • Paper page – Hierarchical Budget Policy Optimization for Adaptive Reasoning
  • Google DeepMind Achieves Gold-Level Math Olympiad Performance, Matching OpenAI
  • Japan’s Legal AI Startup Scores $50 Million Round Led By Goldman Sachs, Partners With OpenAI

Recent Comments

  1. binance Anmeldebonus on David Patterson: Computer Architecture and Data Storage | Lex Fridman Podcast #104
  2. nude on Brain-to-voice neuroprosthesis restores naturalistic speech
  3. Dennisemupt on Local gov’t reps say they look forward to working with Thomas
  4. checkСarBig on How Cursor and Claude Are Developing AI Coding Tools Together
  5. 37Gqfff22.com on Brain-to-voice neuroprosthesis restores naturalistic speech

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.