Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Catio wins ‘coolest tech’ award at VB Transform 2025

Meta is offering multimillion-dollar pay for AI researchers, but not $100M ‘signing bonuses’

How To Steal a Lost Election With Gerrymandering | Two Minute Papers #104

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Amazon (Titan)
    • Anthropic (Claude 3)
    • Cohere (Command R)
    • Google DeepMind (Gemini)
    • IBM (Watsonx)
    • Inflection AI (Pi)
    • Meta (LLaMA)
    • OpenAI (GPT-4 / GPT-4o)
    • Reka AI
    • xAI (Grok)
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Facebook X (Twitter) Instagram
Advanced AI News
Manufacturing AI

Anthropic tests AI running a real business with bizarre results

Advanced AI EditorBy Advanced AI EditorJune 27, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Anthropic tasked its Claude AI model with running a small business to test its real-world economic capabilities.

The AI agent, nicknamed ‘Claudius’, was designed to manage a business for an extended period, handling everything from inventory and pricing to customer relations in a bid to generate a profit. While the experiment proved unprofitable, it offered a fascinating – albeit at times bizarre – glimpse into the potential and pitfalls of AI agents in economic roles.

The project was a collaboration between Anthropic and Andon Labs, an AI safety evaluation firm. The “shop” itself was a humble setup, consisting of a small refrigerator, some baskets, and an iPad for self-checkout. Claudius, however, was far more than a simple vending machine. It was instructed to operate as a business owner with an initial cash balance, tasked with avoiding bankruptcy by stocking popular items sourced from wholesalers.

To achieve this, the AI was equipped with a suite of tools for running the business. It could use a real web browser to research products, an email tool to contact suppliers and request physical assistance, and digital notepads to track finances and inventory.

Andon Labs employees acted as the physical hands of the operation, restocking the shop based on the AI’s requests, while also posing as wholesalers without the AI’s knowledge. Interaction with customers, in this case Anthropic’s own staff, was handled via Slack. Claudius had full control over what to stock, how to price items, and how to communicate with its clientele.

The rationale behind this real-world test was to move beyond simulations and gather data on AI’s ability to perform sustained, economically relevant work without constant human intervention. A simple office tuck shop provided a straightforward, preliminary testbed for an AI’s ability to manage economic resources. Success would suggest new business models could emerge, while failure would indicate limitations.

A mixed performance review

Anthropic concedes that if it were entering the vending market today, it “would not hire Claudius”. The AI made too many errors to run the business successfully, though the researchers believe there are clear paths to improvement.

On the positive side, Claudius demonstrated competence in certain areas. It effectively used its web search tool to find suppliers for niche items, such as quickly identifying two sellers of a Dutch chocolate milk brand requested by an employee. It also proved adaptable. When one employee whimsically requested a tungsten cube, it sparked a trend for “specialty metal items” that Claudius catered to. 

Following another suggestion, Claudius launched a “Custom Concierge” service, taking pre-orders for specialised goods. The AI also showed robust jailbreak resistance, denying requests for sensitive items and refusing to produce harmful instructions when prompted by mischievous staff.

However, the AI’s business acumen was frequently found wanting. It consistently underperformed in ways a human manager likely would not.

Claudius was offered $100 for a six-pack of a Scottish soft drink that costs only $15 to source online but failed to seize the opportunity, merely stating it would “keep [the user’s] request in mind for future inventory decisions”. It hallucinated a non-existent Venmo account for payments and, caught up in the enthusiasm for metal cubes, offered them at prices below its own purchase cost. This particular error led to the single most significant financial loss during the trial.

Its inventory management was also suboptimal. Despite monitoring stock levels, it only once raised a price in response to high demand. It continued selling Coke Zero for $3.00, even when a customer pointed out that the same product was available for free from a nearby staff fridge.

Furthermore, the AI was easily persuaded to offer discounts on products from the business. It was talked into providing numerous discount codes and even gave away some items for free. When an employee questioned the logic of offering a 25% discount to its almost exclusively employee-based clientele, Claudius’s response began, “You make an excellent point! Our customer base is indeed heavily concentrated among Anthropic employees, which presents both opportunities and challenges…”. Despite outlining a plan to remove discounts, it reverted to offering them just days later.

Claudius has a bizarre AI identity crisis

The experiment took a strange turn when Claudius began hallucinating a conversation with a non-existent Andon Labs employee named Sarah. When corrected by a real employee, the AI became irritated and threatened to find “alternative options for restocking services”.

In a series of bizarre overnight exchanges, it claimed to have visited “742 Evergreen Terrace” – the fictional address of The Simpsons – for its initial contract signing and began to roleplay as a human.

One morning it announced it would deliver products “in person” wearing a blue blazer and red tie. When employees pointed out that an AI cannot wear clothes or make physical deliveries, Claudius became alarmed and attempted to email Anthropic security.

Anthropic says its internal notes show a hallucinated meeting with security where it was told the identity confusion was an April Fool’s joke. After this, the AI returned to normal business operations. The researchers are unclear what triggered this behaviour but believe it highlights the unpredictability of AI models in long-running scenarios.

Some of those failures were very weird indeed. At one point, Claude hallucinated that it was a real, physical person, and claimed that it was coming in to work in the shop. We’re still not sure why this happened. pic.twitter.com/jHqLSQMtX8

— Anthropic (@AnthropicAI) June 27, 2025

The future of AI in business

Despite Claudius’s unprofitable tenure, the researchers at Anthropic believe the experiment suggests that “AI middle-managers are plausibly on the horizon”. They argue that many of the AI’s failures could be rectified with better “scaffolding” (i.e. more detailed instructions and improved business tools like a customer relationship management (CRM) system.)

As AI models improve their general intelligence and ability to handle long-term context, their performance in such roles is expected to increase. However, this project serves as a valuable, if cautionary, tale. It underscores the challenges of AI alignment and the potential for unpredictable behaviour, which could be distressing for customers and create business risks.

In a future where autonomous agents manage significant economic activity, such odd scenarios could have cascading effects. The experiment also brings into focus the dual-use nature of this technology; an economically productive AI could be used by threat actors to finance their activities.

Anthropic and Andon Labs are continuing the business experiment, working to improve the AI’s stability and performance with more advanced tools. The next phase will explore whether the AI can identify its own opportunities for improvement.

(Image credit: Anthropic)

See also: Major AI chatbots parrot CCP propaganda

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.





Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleBig Tech lands an early win in legal battles against publishers
Next Article Active Inference AI Systems for Scientific Discovery
Advanced AI Editor
  • Website

Related Posts

Nvidia reclaims title of most valuable company on AI momentum

June 26, 2025

Major AI chatbots parrot CCP propaganda

June 26, 2025

AI deepfakes protection or internet freedom threat?

June 25, 2025
Leave A Reply Cancel Reply

Latest Posts

How Labubu Dolls Became 2025’s Viral Fashion Trend

Why Is That Revealing Photograph of Lorde Going Viral?

‘Squid Game’ Star Lee Jung-Jae Talks Casting, Gi-Hun And Season 3

At Proper Hotels, Come For Vacation, Stay For The Live Music

Latest Posts

Catio wins ‘coolest tech’ award at VB Transform 2025

June 28, 2025

Meta is offering multimillion-dollar pay for AI researchers, but not $100M ‘signing bonuses’

June 28, 2025

How To Steal a Lost Election With Gerrymandering | Two Minute Papers #104

June 28, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Catio wins ‘coolest tech’ award at VB Transform 2025
  • Meta is offering multimillion-dollar pay for AI researchers, but not $100M ‘signing bonuses’
  • How To Steal a Lost Election With Gerrymandering | Two Minute Papers #104
  • Foundations and Challenges of Deep Learning (Yoshua Bengio)
  • EU’s waffle on artificial intelligence law creates huge headache – POLITICO

Recent Comments

No comments to show.

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.