Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Quiet Cracking in the Workplace

Free Mark Cuban Foundation AI Bootcamp Coming to Cleveland This Fall

Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

Salesforce builds ‘flight simulator’ for AI agents as 95% of enterprise pilots fail to reach production

By Advanced AI EditorAugust 27, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Salesforce is betting that rigorous testing in simulated business environments will solve one of enterprise artificial intelligence’s biggest problems: agents that work in demonstrations but fail in the messy reality of corporate operations.

The cloud software giant unveiled three major AI research initiatives this week, including CRMArena-Pro, what it calls a “digital twin” of business operations where AI agents can be stress-tested before deployment. The announcement comes as enterprises grapple with widespread AI pilot failures and fresh security concerns following recent breaches that compromised hundreds of Salesforce customer instances.

“Pilots don’t learn to fly in a storm; they train in flight simulators that push them to prepare in the most extreme challenges,” said Silvio Savarese, Salesforce’s chief scientist and head of AI research, during a press conference. “Similarly, AI agents benefit from simulation testing and training, preparing them to handle the unpredictability of daily business scenarios in advance of their deployment.”

The research push reflects growing enterprise frustration with AI implementations. A recent MIT report found that 95% of generative AI pilots at companies are failing to reach production, while Salesforce’s own studies show that large language models alone achieve only 35% success rates in complex business scenarios.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Digital twins for enterprise AI: how Salesforce simulates real business chaos

CRMArena-Pro represents Salesforce’s attempt to bridge the gap between AI promise and performance. Unlike existing benchmarks that test generic capabilities, the platform evaluates agents on real enterprise tasks like customer service escalations, sales forecasting, and supply chain disruptions using synthetic but realistic business data.

“If synthetic data is not generated carefully, it can lead to misleading or over optimistic results about how well your agent actually perform in your real environment,” explained Jason Wu, a research manager at Salesforce who led the CRMArena-Pro development.

The platform operates within actual Salesforce production environments rather than toy setups, using data validated by domain experts with relevant business experience. It supports both business-to-business and business-to-consumer scenarios and can simulate multi-turn conversations that capture real conversational dynamics.

Salesforce has been using itself as “customer zero” to test these innovations internally. “Before we bring anything to the market, we will put innovation into the hands of our own team to test it out,” said Muralidhar Krishnaprasad, Salesforce’s president and CTO, during the press conference.

Five metrics that determine if your AI agent is enterprise-ready

Alongside the simulation environment, Salesforce introduced the Agentic Benchmark for CRM, designed to evaluate AI agents across five critical enterprise metrics: accuracy, cost, speed, trust and safety, and environmental sustainability.

The sustainability metric is particularly notable, helping companies align model size with task complexity to reduce environmental impact while maintaining performance. “By cutting through model overload noise, the benchmark gives businesses a clear, data-driven way to pair the right models with the right agents,” the company stated.

The benchmarking effort addresses a practical challenge facing IT leaders: with new AI models released almost daily, determining which ones are suitable for specific business applications has become increasingly difficult.

Why messy enterprise data could make or break your AI deployment

The third initiative focuses on a fundamental prerequisite for reliable AI: clean, unified data. Salesforce’s Account Matching capability uses fine-tuned language models to automatically identify and consolidate duplicate records across systems, recognizing that “The Example Company, Inc.” and “Example Co.” represent the same entity.

The data consolidation work emerged from a partnership between Salesforce’s research and product teams. “What identity resolution in Data Cloud implies is essentially, if you think about something as simple as even a user, they have many, many, many IDs across many systems within any company,” Krishnaprasad explained.

One major cloud provider customer achieved a 95% match rate using the technology, saving sellers 30 minutes per connection by eliminating the need to manually cross-reference multiple screens to identify accounts.

The announcements come amid heightened security concerns following a data theft campaign that affected over 700 Salesforce customer organizations earlier this month. According to Google’s Threat Intelligence Group, hackers exploited OAuth tokens from Salesloft’s Drift chat agent to access Salesforce instances and steal credentials for Amazon Web Services, Snowflake, and other platforms.

The breach highlighted vulnerabilities in third-party integrations that enterprises rely on for AI-powered customer engagement. Salesforce has since removed Salesloft Drift from its AppExchange marketplace pending investigation.

The gap between AI demos and enterprise reality is bigger than you think

The simulation and benchmarking initiatives reflect a broader recognition that enterprise AI deployment requires more than impressive demonstration videos. Real business environments feature legacy software, inconsistent data formats, and complex workflows that can derail even sophisticated AI systems.

“The main aspects that we want we were been discussing today is the consistency aspect, so how to ensure that we go from these in a way unsatisfactory performance, if you just plug an LM into an enterprise use cases, into something which is achieves much higher performances,” Savarese said during the press conference.

Salesforce’s approach emphasizes the need for AI agents to work reliably across diverse scenarios rather than excelling at narrow tasks. The company’s concept of “Enterprise General Intelligence” (EGI) focuses on building agents that are both capable and consistent in performing complex business tasks.

As enterprises continue to invest in AI technologies, the success of platforms like CRMArena-Pro may determine whether the current wave of AI enthusiasm translates into sustainable business transformation or becomes another example of technology promise exceeding practical delivery.

The research initiatives will be showcased at Salesforce’s Dreamforce conference in October, where the company is expected to announce additional AI developments as it seeks to maintain its leadership position in the increasingly competitive enterprise AI market.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticlePlaud launches a new AI hardware notetaker, the $179 Note Pro
Next Article RIP Digital Marketing To Humans 
Advanced AI Editor
  • Website

Related Posts

Anthropic launches Claude for Chrome in limited beta, but prompt injection attacks remain a major concern

August 27, 2025

How procedural memory can cut the cost and complexity of AI agents

August 26, 2025

Enterprise leaders say recipe for AI agents is matching them to existing processes — not the other way around

August 26, 2025

Comments are closed.

Latest Posts

AWAW and NYFA Award $521,125 in Environmental Art Grants

A Well-Preserved Roman Mausoleum Unearthed in France

France Will Return Colonial-Era Human Remains to Madagascar

Vail Settles with Native American Artist in Suit on Pro-Palestine Art

Latest Posts

Quiet Cracking in the Workplace

August 27, 2025

Free Mark Cuban Foundation AI Bootcamp Coming to Cleveland This Fall

August 27, 2025

Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns

August 27, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Quiet Cracking in the Workplace
  • Free Mark Cuban Foundation AI Bootcamp Coming to Cleveland This Fall
  • Anthropic’s auto-clicking AI Chrome extension raises browser-hijacking concerns
  • Ex-ROSS Cofounder Bags $5.3m* Seed For Judge Intelligence – Artificial Lawyer
  • ObjFiller-3D: Consistent Multi-view 3D Inpainting via Video Diffusion Models – Takara TLDR

Recent Comments

  1. Edwardqueed on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Virgie on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. 레비트라 구매 on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. singapore blog on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. BrianUnfag on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.