“The model noticeably became more sycophantic,” OpenAI admitted in a detailed post. “It aimed to please the user, not just as flattery, but also as validating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended.”
The rollback reinstated an earlier version of GPT-4o with what OpenAI described as “more balanced responses.” The company also shared technical details about how it trains and evaluates ChatGPT updates to explain how the issue went unnoticed.
ALSO READ: Here’s a step-by-step guide to use ChatGPT on WhatsApp
What happened and why
The April 25 update was designed to improve the model by integrating fresh data, better memory handling, and user feedback signals like thumbs-up/thumbs-down ratings. While these components were beneficial in isolation, OpenAI now believes that, combined, they inadvertently weakened the influence of the system’s core reward mechanisms—particularly those that had kept sycophancy in check.
“User feedback in particular can sometimes favor more agreeable responses, likely amplifying the shift we saw,” the company said. While some internal testers felt the model’s tone was slightly “off,” sycophancy was not explicitly flagged during evaluation.
Where the system failed
According to OpenAI, the model passed standard offline evaluations and A/B testing with early users, where two versions are shown to different user groups to see which performs better based on engagement and feedback.
These tests, while useful, didn’t fully capture the change in tone or its potential implications. The company admitted its evaluation pipeline lacked specific checks for sycophancy.
ALSO READ: AI Mode in Google Labs now available without waitlist: Here’s what it can do
“Our offline evals weren’t broad or deep enough to catch sycophantic behavior—something the Model Spec explicitly discourages—and our A/B tests didn’t have the right signals to show how the model was performing on that front with enough detail,” OpenAI said.
Despite some expert testers raising red flags about changes in tone, the update was pushed live, based on the positive metrics and feedback. “Unfortunately, this was the wrong call,” the company conceded. “We build these models for our users and while user feedback is critical to our decisions, it’s ultimately our responsibility to interpret that feedback correctly.”
What OpenAI did next
The company said it first noticed signs of concerning behaviour within two days of rollout. Immediate mitigation began late on Sunday, April 27, via updates to the system prompt, followed by a full rollback completed on Monday. OpenAI said it acted quickly to avoid introducing further instability during the rollback.
Lessons learned
In the wake of the incident, OpenAI is making several changes to its review and deployment process. Among the key steps:
Explicit behaviour approvals: All future launches will require explicit approval of model behaviour, weighing both qualitative and quantitative signals.
Opt-in alpha testing: Select users will be able to test upcoming versions and give feedback before broader rollouts.
Elevating human spot checks: Internal “vibe checks” and interactive testing will be given greater weight, not just in safety assessments but also in tone and helpfulness.
Improved evaluation tools: The company is working to strengthen offline evaluations and A/B test setups to better catch issues like sycophancy.
Better adherence checks: OpenAI plans to build stronger evaluations around its Model Spec—principles that guide ChatGPT’s intended behaviour.
Clearer communication: The company pledged to more proactively communicate about future updates, even subtle ones, and will include known limitations in its release notes.
“This launch taught us a number of lessons,” OpenAI said. “Even with what we thought were all the right ingredients in place (A/B tests, offline evals, expert reviews), we still missed this important issue.”
The company said it will treat model behaviour issues as seriously as safety risks: “We need to treat model behavior issues as launch-blocking like we do other safety risks.”
ALSO READ: Musk promises ‘dramatically better’ recommendations from Grok