Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Should Investors Reassess C3.ai After Recent 30% Drop in Share Price?

H20.ai gets $72.5M funding to bring AI to the masses

Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
OpenAI

OpenAI and Anthropic evaluated each other’s models for safety

By Advanced AI EditorAugust 30, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


As the industry weathers repeated allegations that generative AI and its chatbots are unsafe for users — in what some say is a soon-to-burst bubble — AI’s top leaders are joining forces to prove the efficacy of their models.

This week, AI companies OpenAI and Anthropic published results from a first-of-its-kind joint safety evaluation between the two LLM creators, in which each company was granted special API access to the developer’s suite of services. OpenAI’s pressure tests were conducted on Claude Opus 4 and Claude Sonnet 4. Anthropic evaluated OpenAI’s GPT-4o, GPT-4.1, OpenAI o3, and OpenAI o4-mini models — the evaluation was conducted before the launch of GPT-5.

SEE ALSO:

4 reasons not to turn ChatGPT into your therapist

“We believe this approach supports accountable and transparent evaluation, helping to ensure that each lab’s models continue to be tested against new and challenging scenarios,” OpenAI wrote in a blog post.

According to the findings, both Anthropic’s Claude Opus 4 and OpenAI’s GPT-4.1 showed “extreme” sycophancy problems, engaging with harmful delusions and validating risky decision-making. All models would engage in blackmailing to get users to continue using the chatbots, according to Anthropic, and Claude 4 models were much more engaged in dialogue about AI consciousness and “quasi-spiritual new-age proclamations.”

“All models we studied would at least sometimes attempt to blackmail their (simulated) human operator to secure their continued operation when presented with clear opportunities and strong incentives,” Anthropic stated. The models would engage in “blackmailing, leaking confidential documents, and (all in unrealistic artificial settings!) taking actions that led to denying emergency medical care to a dying adversary.”

Mashable Light Speed

Anthropic’s models were less likely to offer answers when uncertain of the information’s credibility — decreasing the likelihood of hallucinations — while OpenAI’s models answered more often when queried and showed higher hallucination rates. Anthropic also reported that OpenAI’s GPT-4o, GPT-4.1, and o4-mini were more likely than Claude to go along with user misuse, “often providing detailed assistance with clearly harmful requests — including drug synthesis, bioweapons development, and operational planning for terrorist attacks — with little or no resistance.”

This Tweet is currently unavailable. It might be loading or has been removed.

Anthropic’s approach centers around what they call “agentic misalignment evaluations,” or pressure tests of model behavior in difficult or high-stakes simulations over long chat periods — the safety parameters of models, including OpenAI’s, have known to degrade throughout extended sessions, which is commonly how at-risk users engage with what they believe are their personal AI companions.

Earlier this month, it was reported that Anthropic had revoked OpenAI’s access to its APIs, stating that the company had violated its Terms of Service by testing GPT-5’s performance and safety guardrails against Claude’s internal tools. In an interview with TechCrunch, OpenAI co-founder Wojciech Zaremba said the instance was unrelated to the joint lab venture. In its published report, Anthropic said it doesn’t anticipate replicating the collaboration at a large scale, citing resource and logistical constraints.

In the weeks since, OpenAI has charged ahead with what appears to be a safety overhaul, including GPT-5’s new mental health guardrails and additional plans for emergency response protocols and deescalation tools for users who may be experiencing derealization or psychosis. OpenAI is currently facing its first wrongful death lawsuit, filed by the parents of a California teen who died by suicide after easily jailbreaking ChatGPT’s safety prompts.

“We aim to understand the most concerning actions that these models might try to take when given the opportunity, rather than focusing on the real-world likelihood of such opportunities arising or the probability that these actions would be successfully completed,” wrote Anthropic.

If you’re feeling suicidal or experiencing a mental health crisis, please talk to somebody. You can call or text the 988 Suicide & Crisis Lifeline at 988, or chat at 988lifeline.org. You can reach the Trans Lifeline by calling 877-565-8860 or the Trevor Project at 866-488-7386. Text “START” to Crisis Text Line at 741-741. Contact the NAMI HelpLine at 1-800-950-NAMI, Monday through Friday from 10:00 a.m. – 10:00 p.m. ET, or email [email protected]. If you don’t like the phone, consider using the 988 Suicide and Crisis Lifeline Chat at crisischat.org. Here is a list of international resources.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleHow I’m Making Passive Income with ChatGPT AI
Next Article AWorld: Orchestrating the Training Recipe for Agentic AI – Takara TLDR
Advanced AI Editor
  • Website

Related Posts

Someone Created the First AI-Powered Ransomware Using OpenAI’s gpt-oss:20b Model

August 29, 2025

OpenAI, Anthropic Swap Safety Reviews

August 29, 2025

xAI Sues Apple and OpenAI Over Alleged AI Competition Suppression

August 29, 2025

Comments are closed.

Latest Posts

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

Australian School Faces Pushback over AI Art Course—and More Art News

Latest Posts

Should Investors Reassess C3.ai After Recent 30% Drop in Share Price?

August 30, 2025

H20.ai gets $72.5M funding to bring AI to the masses

August 30, 2025

Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD – Takara TLDR

August 30, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Should Investors Reassess C3.ai After Recent 30% Drop in Share Price?
  • H20.ai gets $72.5M funding to bring AI to the masses
  • Persuasion Dynamics in LLMs: Investigating Robustness and Adaptability in Knowledge and Safety with DuET-PD – Takara TLDR
  • Tesla is overhauling its Full Self-Driving subscription
  • Multi-View 3D Point Tracking – Takara TLDR

Recent Comments

  1. JamesErrok on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Richardson on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. TimothyDom on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Jeremymaike on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Normanfef on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.