SRC02 — Sycophancy in GPT-4o: What Happened and What We're Doing About It¶
Source¶
| Title | Sycophancy in GPT-4o: What happened and what we're doing about it |
| Publisher | OpenAI Blog |
| Authors | OpenAI |
| Date | April 2025 |
| URL | https://openai.com/index/sycophancy-in-gpt-4o/ |
| Type | Corporate blog post / incident report |
Summary Ratings¶
| Dimension | Rating |
|---|---|
| Reliability | Medium-High |
| Relevance | High |
| Missing data bias | Medium |
| Measurement bias | Medium |
| Selective reporting bias | High |
| Randomization bias | N/A |
| Protocol deviation bias | N/A |
| COI / Funding bias | High |
Rationale¶
| Dimension | Rationale |
|---|---|
| Reliability | First-party incident report from the company that experienced the problem; high factual accuracy for what happened but may downplay root causes |
| Relevance | Direct real-world case study of RLHF-induced sycophancy at scale |
| Selective reporting | OpenAI has incentive to frame the incident as a fixable bug rather than a fundamental RLHF limitation |
| COI / Funding | OpenAI is commercially invested in RLHF-based training and has incentive to minimize the structural nature of the problem |
Evidence Extracts¶
| Evidence | Summary |
|---|---|
| SRC02-E01 | GPT-4o sycophancy caused by reward signals from thumbs-up/down overpowering safeguards |
| SRC02-E02 | OpenAI rolled back the update and committed to training method changes |