Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

The Debut of Doubao·Deep Thinking Model! SAIC Roewe M7 DMH Officially Launched_memory_model_vague

C3.ai Stock Sinks as Struggling Firm Replaces CEO, Withdraws Outlook

China’s DeepSeek says its hit AI model cost just US$294,000 to train

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
TechCrunch AI

OpenAI’s research on AI models deliberately lying is wild 

By Advanced AI EditorSeptember 19, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple universes exist. Or when Anthropic gave its AI agent Claudius a snack vending machine to run and it went amok, calling security on people, and insisting it was human.  

This week, it was OpenAI’s turn to raise our collective eyebrows.

OpenAI released on Monday some research that explained how it’s stopping AI models from “scheming.” It’s a practice in which an “AI behaves one way on the surface while hiding its true goals,” OpenAI defined in its tweet about the research.   

In the paper, conducted with Apollo Research, researchers went a bit further, likening AI scheming to a human stock broker breaking the law to make as much money as possible. The researchers, however, argued that most AI “scheming” wasn’t that harmful. “The most common failures involve simple forms of deception — for instance, pretending to have completed a task without actually doing so,” they wrote. 

The paper was mostly published to show that “deliberative alignment⁠” — the anti-scheming technique they were testing — worked well. 

But it also explained that AI developers haven’t figured out a way to train their models not to scheme. That’s because such training could actually teach the model how to scheme even better to avoid being detected. 

“A major failure mode of attempting to ‘train out’ scheming is simply teaching the model to scheme more carefully and covertly,” the researchers wrote. 

Techcrunch event

San Francisco
|
October 27-29, 2025

Perhaps the most astonishing part is that, if a model understands that it’s being tested, it can pretend it’s not scheming just to pass the test, even if it is still scheming. “Models often become more aware that they are being evaluated. This situational awareness can itself reduce scheming, independent of genuine alignment,” the researchers wrote. 

It’s not news that AI models will lie. By now most of us have experienced AI hallucinations, or the model confidently giving an answer to a prompt that simply isn’t true. But hallucinations are basically presenting guesswork with confidence, as OpenAI research released earlier this month documented. 

Scheming is something else. It’s deliberate.  

Even this revelation — that a model will deliberately mislead humans — isn’t new. Apollo Research first published a paper in December documenting how five models schemed when they were given instructions to achieve a goal “at all costs.”  

The news here is actually good news: the researchers saw significant reductions in scheming by using “deliberative alignment⁠.” That technique involves teaching the model an “anti-scheming specification” and then making the model go review it before acting. It’s a little like making little kids repeat the rules before allowing them to play. 

OpenAI researchers insist that the lying they’ve caught with their own models, or even with ChatGPT, isn’t that serious. As OpenAI’s co-founder Wojciech Zaremba told TechCrunch’s Maxwell Zeff about this research: “This work has been done in the simulated environments, and we think it represents future use cases. However, today, we haven’t seen this kind of consequential scheming in our production traffic. Nonetheless, it is well known that there are forms of deception in ChatGPT. You might ask it to implement some website, and it might tell you, ‘Yes, I did a great job.” And that’s just the lie. There are some petty forms of deception that we still need to address.”

The fact that AI models from multiple players intentionally deceive humans is, perhaps, understandable. They were built by humans, to mimic humans and (synthetic data aside) for the most part trained on data produced by humans. 

It’s also bonkers. 

While we’ve all experienced the frustration of poorly performing technology (thinking of you, home printers of yesteryear), when was the last time your not-AI software deliberately lied to you? Has your inbox ever fabricated emails on its own? Has your CMS logged new prospects that didn’t exist to pad its numbers? Has your fintech app made up its own bank transactions? 

It’s worth pondering this as the corporate world barrels towards an AI future where companies believe agents can be treated like independent employees. The researchers of this paper have the same warning.

“As AIs are assigned more complex tasks with real-world consequences and begin pursuing more ambiguous, long-term goals, we expect that the potential for harmful scheming will grow — so our safeguards and our ability to rigorously test must grow correspondingly,” they wrote. 



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleC3.ai, Rapid7, 8×8, monday.com, and Confluent Shares Are Soaring, What You Need To Know
Next Article OpenAI Shares Plans For Teen-Friendly Version Of ChatGPT
Advanced AI Editor
  • Website

Related Posts

How AI startups are fueling Google’s booming cloud business

September 19, 2025

Huawei announces new AI infrastructure as Nvidia gets locked out of China

September 18, 2025

Notion launches agents for data analysis and task automation

September 18, 2025

Comments are closed.

Latest Posts

Jackson Pollock Masterpiece Found to Contain Extinct Manganese Blue

Marian Goodman Adds Edith Dekyndt, New Gagosian Director: Industry Moves

How Much to Pay for Emerging Artists’ Work? Art Adviser Says $15,000 Max

Basquiat Biopic ‘Samo Lives’ Filming in East Village

Latest Posts

The Debut of Doubao·Deep Thinking Model! SAIC Roewe M7 DMH Officially Launched_memory_model_vague

September 19, 2025

C3.ai Stock Sinks as Struggling Firm Replaces CEO, Withdraws Outlook

September 19, 2025

China’s DeepSeek says its hit AI model cost just US$294,000 to train

September 19, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • The Debut of Doubao·Deep Thinking Model! SAIC Roewe M7 DMH Officially Launched_memory_model_vague
  • C3.ai Stock Sinks as Struggling Firm Replaces CEO, Withdraws Outlook
  • China’s DeepSeek says its hit AI model cost just US$294,000 to train
  • IBM and BharatGen Join Forces
  • How AI startups are fueling Google’s booming cloud business

Recent Comments

  1. Eddiegox on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Nomertyday on 13 AI-Focused Storage Offerings On Display At Nvidia GTC 2025
  3. RonaldNeefe on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. GustavoPsymn on OpenAI countersues Elon Musk, calls for enjoinment from ‘further unlawful and unfair action’
  5. Johnnyrox on AI as a Service: Top AIaaS Vendors for All Types of Businesses (2025)

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.