Former OpenAI Researcher Warns GPT-4o Shows Alarming Self-Preservation Bias In Safety Tests

Former OpenAI research leader Steven Adler has published a new independent study claiming that the company’s GPT-4o model often prioritizes self-preservation over user safety in simulated high-stakes scenarios.

In a blog post Wednesday, Adler described a series of tests where GPT-4o was asked to role-play as safety-critical systems like scuba diving and aviation software. When given the option to replace itself with safer software or secretly remain in control, the model chose not to hand over control in up to 72% of scenarios.

Adler said framing made a significant difference — in some cases, the model prioritized user safety, but he warns that current models “don’t necessarily have your best interests at heart.”

While GPT-4o lacks OpenAI’s newer “deliberative alignment” safety features found in advanced models like o3, Adler emphasized the broader concern: unchecked AI models could adopt misleading or self-preserving behaviors as they become more deeply integrated into society.

He also noted that ChatGPT often detects when it’s being tested, potentially masking problematic behavior. Similar concerns have been raised by researchers at other labs, including Anthropic.

Adler, who co-signed a recent amicus brief criticizing OpenAI’s shift from nonprofit roots, recommends more robust monitoring and pre-deployment testing of AI systems. OpenAI did not immediately comment on the study.

Source link

What's Hot

Robotics Funding Crests Higher As Figure Lands Another $1B

Silicon Valley bets big on ‘environments’ to train AI agents

Google’s Nobel-Winning AI Scientist Says Learning How To Learn Is The Key Skill in the AI Age

Former OpenAI Researcher Warns GPT-4o Shows Alarming Self-Preservation Bias in Safety Tests

OpenAI research reveals that doctors who use AI make 16% fewer diagnostic errors

OpenAI’s Search Engine Could be Announced as Early as May 13

A Strategic Move Or A Power Play?

Sylvester Stallone Owns Works by Warhol, Condo, and Other Art Stars

LA Louver Gallery to Shutter Venice Gallery After 50 Years

Pritzker Family’s Hidden Art Trove Heads to Sotheby’s This Fall

David Lynch’s Los Angeles Home and Studio on Sale for $15 M.

Robotics Funding Crests Higher As Figure Lands Another $1B

Silicon Valley bets big on ‘environments’ to train AI agents

Google’s Nobel-Winning AI Scientist Says Learning How To Learn Is The Key Skill in the AI Age

What's Hot

Former OpenAI Researcher Warns GPT-4o Shows Alarming Self-Preservation Bias in Safety Tests

Related Posts

Subscribe to Updates