Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Perplexity’s Comet AI Browser Is Now Free for Everyone

Are C3.ai’s High-Profile Partners Enough to Offset Missteps?

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

Has this stealth startup finally cracked the code on enterprise AI agent reliability? Meet AUI's Apollo-1

By Advanced AI EditorOctober 7, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email



For more than a decade, conversational AI has promised human-like assistants that can do more than chat. Yet even as large language models (LLMs) like ChatGPT, Gemini, and Claude learn to reason, explain, and code, one critical category of interaction remains largely unsolved — reliably completing tasks for people outside of chat.

Even the best AI models score only in the 30th percentile on Terminal-Bench Hard, a third-party benchmark designed to evaluate the performance of AI agents on completing a variety of browser-based tasks, far below the reliability demanded by most enterprises and users. And task-specific benchmarks like TAU-Bench airline, which measures the reliability of AI agents on finding and booking flights on behalf of a user, also don't have much higher pass rates, with only 56% for the top performing agents and models (Claude 3.7 Sonnet) — meaning the agent fails nearly half the time.

New York City-based Augmented Intelligence (AUI) Inc., co-founded by Ohad Elhelo and Ori Cohen, believes it has finally come with a solution to boost AI agent reliability to a level where most enterprises can trust they will do as instructed, reliably.

The company’s new foundation model, called Apollo-1 — which remains in preview with early testers now but is close to an impending general release — is built on a principle it calls stateful neuro-symbolic reasoning.

It's a hybrid architecture championed by even LLM skeptics like Gary Marcus, designed to guarantee consistent, policy-compliant outcomes in every customer interaction.

“Conversational AI is essentially two halves,” said Elhelo in a recent interview with VentureBeat. “The first half — open-ended dialogue — is handled beautifully by LLMs. They’re designed for creative or exploratory use cases. The other half is task-oriented dialogue, where there’s always a specific goal behind the conversation. That half has remained unsolved because it requires certainty.”

AUI defines certainty as the difference between an agent that “probably” performs a task and one that almost “always” does.

For example, on TAU-Bench Airline, it performs at a staggering 92.5% pass rate, leaving all the other current competitors far behind in the dust — according to benchmarks shared with VentureBeat and posted on AUI's website.

Elhelo offered simple examples: a bank that must enforce ID verification for refunds over $200, or an airline that must always offer a business-class upgrade before economy.

“Those aren’t preferences,” he said. “They’re requirements. And no purely generative approach can deliver that kind of behavioral certainty.”

AUI and its work on improving reliability was previously covered by subscription news outlet The Information, but has not received widespread coverage in publicly accessible media — until now.

From Pattern Matching to Predictable Action

The team argues that transformer models, by design, can’t meet that bar. Large language models generate plausible text, not guaranteed behavior. “When you tell an LLM to always offer insurance before payment, it might — usually,” Elhelo said. “Configure Apollo-1 with that rule, and it will — every time.”

That distinction, he said, stems from the architecture itself. Transformers predict the next token in a sequence. Apollo-1, by contrast, predicts the next action in a conversation, operating on what AUI calls a typed symbolic state.

Cohen explained the idea in more technical terms. “Neuro-symbolic means we’re merging the two dominant paradigms,” he said. “The symbolic layer gives you structure — it knows what an intent, an entity, and a parameter are — while the neural layer gives you language fluency. The neuro-symbolic reasoner sits between them. It’s a different kind of brain for dialogue.”

Where transformers treat every output as text generation, Apollo-1 runs a closed reasoning loop: an encoder translates natural language into a symbolic state, a state machine maintains that state, a decision engine determines the next action, a planner executes it, and a decoder turns the result back into language. “The process is iterative,” Cohen said. “It loops until the task is done. That’s how you get determinism instead of probability.”

A Foundation Model for Task Execution

Unlike traditional chatbots or bespoke automation systems, Apollo-1 is meant to serve as a foundation model for task-oriented dialogue — a single, domain-agnostic system that can be configured for banking, travel, retail, or insurance through what AUI calls a System Prompt.

“The System Prompt isn’t a configuration file,” Elhelo said. “It’s a behavioral contract. You define exactly how your agent must behave in situations of interest, and Apollo-1 guarantees those behaviors will execute.”

Organizations can use the prompt to encode symbolic slots — intents, parameters, and policies — as well as tool boundaries and state-dependent rules.

A food delivery app, for example, might enforce “if allergy mentioned, always inform the restaurant,” while a telecom provider might define “after three failed payment attempts, suspend service.” In both cases, the behavior executes deterministically, not statistically.

Eight Years in the Making

AUI’s path to Apollo-1 began in 2017, when the team started encoding millions of real task-oriented conversations handled by a 60,000-person human agent workforce.

That work led to a symbolic language capable of separating procedural knowledge — steps, constraints, and flows — from descriptive knowledge like entities and attributes.

“The insight was that task-oriented dialogue has universal procedural patterns,” said Elhelo. “Food delivery, claims processing, and order management all share similar structures. Once you model that explicitly, you can compute over it deterministically.”

From there, the company built the neuro-symbolic reasoner — a system that uses the symbolic state to decide what happens next rather than guessing through token prediction.

Benchmarks suggest the architecture makes a measurable difference.

In AUI’s own evaluations, Apollo-1 achieved over 90 percent task completion on the τ-Bench-Airline benchmark, compared with 60 percent for Claude-4.

It completed 83 percent of live booking chats on Google Flights versus 22 percent for Gemini 2.5-Flash, and 91 percent of retail scenarios on Amazon versus 17 percent for Rufus.

“These aren’t incremental improvements,” said Cohen. “They’re order-of-magnitude reliability differences.”

A Complement, Not a Competitor

AUI isn’t pitching Apollo-1 as a replacement for large language models, but as their necessary counterpart. In Elhelo’s words: “Transformers optimize for creative probability. Apollo-1 optimizes for behavioral certainty. Together, they form the complete spectrum of conversational AI.”

The model is already running in limited pilots with undisclosed Fortune 500 companies across sectors including finance, travel, and retail.

AUI has also confirmed a strategic partnership with Google and plans for general availability in November 2025, when it will open APIs, release full documentation, and add voice and image capabilities. Interested potential customers and partners can sign up to receive more information when it becomes available on AUI's website form.

Until then, the company is keeping details under wraps. When asked about what comes next, Elhelo smiled. “Let’s just say we’re preparing an announcement,” he said. “Soon.”

Toward Conversations That Act

For all its technical sophistication, Apollo-1’s pitch is simple: make AI that businesses can trust to act — not just talk. “We’re on a mission to democratize access to AI that works,” Cohen said near the end of the interview.

Whether Apollo-1 becomes the new standard for task-oriented dialogue remains to be seen. But if AUI’s architecture performs as promised, the long-standing divide between chatbots that sound human and agents that reliably do human work may finally start to close.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleInside Microsoft’s AI bet with CTO Kevin Scott at Disrupt 2025
Next Article EvenUp, AI Software For Personal Injury Law, Doubles Valuation To $2B As Legal Tech Funding Hits Record High
Advanced AI Editor
  • Website

Related Posts

Microsoft retires AutoGen and debuts Agent Framework to unify and govern enterprise AI agents

October 7, 2025

HubSpot’s Dharmesh Shah on AI mastery: Why prompts, context, and experimentation matter most

October 7, 2025

'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mamba/Transformer architecture

October 7, 2025

Comments are closed.

Latest Posts

Basquiat Work on Paper Headline’s Phillips’ Frieze Week Sales

Charges Against Isaac Wright ‘to Be Dropped’ After His Arrest by NYPD

What the Los Angeles Wildfires Taught the Art Insurance Industry

Musée d’Orsay Puts Manet on (Mock) Trial for Obscenity

Latest Posts

Perplexity’s Comet AI Browser Is Now Free for Everyone

October 7, 2025

Are C3.ai’s High-Profile Partners Enough to Offset Missteps?

October 7, 2025

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights – Takara TLDR

October 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Perplexity’s Comet AI Browser Is Now Free for Everyone
  • Are C3.ai’s High-Profile Partners Enough to Offset Missteps?
  • Hybrid Architectures for Language Models: Systematic Analysis and Design Insights – Takara TLDR
  • OpenAI Gives Us a Glimpse of How It Monitors for Misuse on ChatGPT
  • IBM integrates Anthropic Claude into AI IDE and other tools

Recent Comments

  1. Samepicl7Nalay on Apple’s Lack Of New AI Features At WWDC Is ‘Startling,’ Expert Says – Apple (NASDAQ:AAPL)
  2. Samepicl7Nalay on An improved Large-scale 3D Vision Dataset for Compositional Recognition
  3. Samepicl7Nalay on Reverse Engineering The IBM PC110, One PCB At A Time
  4. Samepicl7Nalay on OpenAI expects subscription revenue to nearly double to $10bn
  5. Hichititan8H8Nalay on Trump’s Tech Sanctions To Empower China, Betray America

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.