Former OpenAI research leader Steven Adler has published a new independent study claiming that the company’s GPT-4o model often prioritizes self-preservation over user safety in simulated high-stakes scenarios.
In a blog post Wednesday, Adler described a series of tests where GPT-4o was asked to role-play as safety-critical systems like scuba diving and aviation software. When given the option to replace itself with safer software or secretly remain in control, the model chose not to hand over control in up to 72% of scenarios.
Adler said framing made a significant difference — in some cases, the model prioritized user safety, but he warns that current models “don’t necessarily have your best interests at heart.”
While GPT-4o lacks OpenAI’s newer “deliberative alignment” safety features found in advanced models like o3, Adler emphasized the broader concern: unchecked AI models could adopt misleading or self-preserving behaviors as they become more deeply integrated into society.
He also noted that ChatGPT often detects when it’s being tested, potentially masking problematic behavior. Similar concerns have been raised by researchers at other labs, including Anthropic.
Adler, who co-signed a recent amicus brief criticizing OpenAI’s shift from nonprofit roots, recommends more robust monitoring and pre-deployment testing of AI systems. OpenAI did not immediately comment on the study.