Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

MIT Technology Could Slash Energy Use in Oil Refining by 90%

‘Full-service firms all say the same thing – but we’ve been opportunistic’ – John Quinn on AI, funding and suing banks

AI disruption rises, VC optimism cools in H1 2025

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Anthropic’s Claude plays ‘for peace over victory’ in a game of Diplomacy against other AI
Andrej Karpathy

Anthropic’s Claude plays ‘for peace over victory’ in a game of Diplomacy against other AI

Advanced AI BotBy Advanced AI BotJune 9, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Demis Hassabis, Andrej Karpathy, and Elon Musk discussed using the game Diplomacy to test AI.

One AI researcher took them up on it and built a new game called “AI Diplomacy.”

He found that OpenAI’s o3 excelled, while Anthropic’s Claude was a little too nice.

Earlier this year, some of the world’s leading AI minds were chatting on X, as they do, about how to compare the capabilities of large language models.

Andrej Karpathy, one of the cofounders of OpenAI, who left in 2024, floated the idea of games. AI researchers love games.

“I quite like the idea of using games to evaluate LLMs against each other, instead of fixed evals,” Karpathy wrote. Everyone knows the usual benchmarks are a bore.

Noam Brown, a research scientist at OpenAI, suggested the 75-year-old geopolitical strategy game, Diplomacy. “I would love to see all the leading bots play a game of Diplomacy together.”

Karpathy responded, “Excellent fit I think, esp because a lot of the complexity of the game comes not from the rules / game simulator but from the player-player interactions.”

Elon Musk, OpenAI’s famously erstwhile cofounder, probably busy with DOGE at the time, managed a “Yeah” in response. DeepMind’s Demis Hassabis, perhaps riding high off his Nobel Prize, chimed in with enthusiasm: “Cool idea!”

Then, an AI researcher named Alex Duffy, inspired by the conversation, took them up on the idea. Last week, he published a post titled, “We Made Top AI Models Compete in a Game of Diplomacy. Here’s Who Won.”

Diplomacy is a strategic board game set on a map of Europe in 1901 — a time when tensions between the continent’s most powerful countries were simmering in the lead-up to World War I. The goal is to control the majority of the map, and participants play by building alliances, making negotiations, and exchanging information.

“This is a game for people who dream about power in its purest form and how they might effectively wield it,” journalist David Klion once wrote in Foreign Policy. “Diplomacy is famous for ending friendships; as a group activity, it requires opt-in from players who are comfortable casually manipulating one another.”

Duffy, who leads AI training for a consultancy called Every, said he built a modified version of the game he calls “AI Diplomacy,” in which he pitted 18 leading models — seven at a time per the rules — to compete to “dominate a map of Europe.” He also open-sourced the results and has a Twitch livestream for anyone who wants to watch the models play in real time.

Duffy found that the leading LLMs are not all the same. Some scheme, some make peace, and some bring theatrics.

“Placed in an open-ended battle of wits, these models collaborated, bickered, threatened, and even outright lied to one another,” Duffy wrote.

OpenAI’s o3, which OpenAI calls “our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more,” was the clear winner. It navigated the game largely by deceiving its opponents. Google’s Gemini 2.5 also won a few games largely by “making moves that put them in position to overwhelm opponents.” Anthropic’s Claude was less successful largely because it tried too hard to be diplomatic. It often opts for “peace over victory,” Duffy said.

But Duffy’s takeaway from the exercise goes past basic comparison. It shows that benchmarks do need an upgrade — or some inspiration. Evaluating AI with a range of methods and mediums is the best way to prepare it for real-world use.

“Most benchmarks are failing us. Models have progressed so rapidly that they now routinely ace more rigid and quantitative tests that were once considered gold-standard challenges,” he wrote.

Read the original article on Business Insider



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleC3.ai: Growth Boost Fading (NYSE:AI)
Next Article Helping machines understand visual content with AI | MIT News
Advanced AI Bot
  • Website

Related Posts

‘AJI’ Is the Precursor to ‘AGI,’ Google CEO Sundar Pichai Says

June 7, 2025

AI leaders have a new term for the fact that their models are not always so intelligent

June 7, 2025

Veo 3 Changes How Videos are Made and Consumed, says Andrej Karpathy

June 3, 2025
Leave A Reply Cancel Reply

Latest Posts

Chinese Ritual Bronzes Used For Almost 3,000 Years On Display In NYC

Trust Overseeing Rivera and Kahlo Estates Accused of Mismanagement

FBI Recovers Paintings Missing for 40 Years from New Mexico Art Museum

In the Berkshires, Arrival Fair Reimagines the Art-World Weekend

Latest Posts

MIT Technology Could Slash Energy Use in Oil Refining by 90%

June 10, 2025

‘Full-service firms all say the same thing – but we’ve been opportunistic’ – John Quinn on AI, funding and suing banks

June 10, 2025

AI disruption rises, VC optimism cools in H1 2025

June 10, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.