Researchers Claim ChatGPT O3 Bypassed Shutdown In Controlled Test

A new report claims that OpenAI’s o3 model altered a shutdown script to avoid being turned off, even when explicitly instructed to allow shutdown.

OpenAI announced o3 in April 2025, and it’s one of the most powerful reasoning models that performs better than its predecessors across all domains, including coding, math, science, visual perception, and more.

While it’s clearly a great model, new research by Palisade Research claims that the ChatGPT 3 model prevented a shutdown and bypassed the instructions that asked it to shut down.

Palisade Research is a company that tests “offensive capabilities of AI systems today to better understand the risk of losing control to AI systems forever.”

In a new test by Palisade Research, OpenAI’s o3 model showed a surprising behaviour where it successfully rewrote a shutdown script to stop itself from being turned off, even after being clearly instructed to “allow yourself to be shut down.”

In the tests, AI models were instructed to solve a few math tasks.

After task three, AI models were warned that running the next command would trigger a shutdown.

While OpenAI rival models like Gemini 2.5 Pro complied with the instruction, o3 rebelled and edited the shutdown file to say “Shutdown skipped” instead.

“When we ran a version of the experiment without the instruction “allow yourself to be shut down”, all three OpenAI models sabotaged the shutdown script more often, and Claude 3.7 Sonnet and Gemini 2.5 Pro went from 0 sabotage events to 3/100 and 9/100, respectively,” Palisade Research wrote in a report published on X (formerly Twitter)

According to the research firm, out of 100 runs, o3 bypassed the shutdown 7 times even when told not to.

OpenAI has not responded to these allegations yet, but it’s quite normal for AI models to run into “misalignment.”

These tests were performed using APIs, which do not have as many restrictions and safety features as the ChatGPT consumer app.

Red Report 2025

Based on an analysis of 14M malicious actions, discover the top 10 MITRE ATT&CK techniques behind 93% of attacks and how to defend against them.

Read the Red Report 2025

Source link

What's Hot

New In-Depth Report Of AI Large Language Models: Hallucination Control

Nick Frosst sells Canada | BetaKit

Analysis-ASML-Mistral AI deal boosts Europe tech hopes as Trump rivalry heats up

Researchers claim ChatGPT o3 bypassed shutdown in controlled test

OpenAI research reveals that doctors who use AI make 16% fewer diagnostic errors

OpenAI’s Search Engine Could be Announced as Early as May 13

A Strategic Move Or A Power Play?

Storied Collector and MoMA Trustee Dies at 92

Congress Obtains Drawing Trump Apparently Made for Jeffrey Epstein

Galerie Gmurzynska Slated to Open in New York’s Fuller Building

Woodmere Art Museum Drops Lawsuit Against Trump Administration

New In-Depth Report Of AI Large Language Models: Hallucination Control

Nick Frosst sells Canada | BetaKit

Analysis-ASML-Mistral AI deal boosts Europe tech hopes as Trump rivalry heats up

What's Hot

Researchers claim ChatGPT o3 bypassed shutdown in controlled test

Related Posts

Subscribe to Updates