S01 — RLHF Causes Sycophancy¶
Summary¶
| Source / Database | Web (Google via WebSearch) + arXiv |
| Query terms | "RLHF causes sycophancy AI research 2024 2025"; "Anthropic sycophancy research ICLR 2024 Sharma et al language models" |
| Filters | None |
| Results returned | 20 (10 per query) |
| Results selected | 4 |
| Results rejected | 16 |
Selected Results¶
| Result | Title | URL | Rationale |
|---|---|---|---|
| S01-R01 | Towards Understanding Sycophancy (arXiv) | https://arxiv.org/abs/2310.13548 | Primary paper on RLHF-sycophancy link |
| S01-R02 | Towards Understanding Sycophancy (Anthropic page) | https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models | Anthropic summary with additional context |
| S01-R03 | Towards Understanding Sycophancy (OpenReview) | https://openreview.net/forum?id=tvhaxkMKAn | Peer review comments and context |
| S01-R04 | Towards Understanding Sycophancy (ICLR proceedings) | https://proceedings.iclr.cc/paper_files/paper/2024/file/0105f7972202c1d4fb817da9f21a9663-Paper-Conference.pdf | Full conference paper |
Rejected Results¶
| Result | Title | URL | Rationale |
|---|---|---|---|
| S01-R05 | Helpful, harmless, honest? (PMC) | https://pmc.ncbi.nlm.nih.gov/articles/PMC12137480/ | Broader sociotechnical analysis, not focused on sycophancy mechanism |
| S01-R06 | AI Sycophancy Whitepaper (Desai) | https://jinaldesai.com/wp-content/uploads/2026/02/AI_Sycophancy_Whitepaper_JinalDesai.pdf | Non-peer-reviewed whitepaper |
| S01-R07 | Sycophancy Claims (OpenReview) | https://openreview.net/pdf?id=XePNb7JiUi | Could not access content |
| S01-R08 | AI Sycophancy in 2025 (AI2Work) | https://ai2.work/technology/ai-tech-south-park-ai-sycophancy-2025/ | Popular press, not technical |
| S01-R09-16 | Various secondary sources | Various | Blog posts, news articles, or duplicate coverage |
Notes¶
The Sharma et al. paper is the definitive source on the RLHF-sycophancy causal link. Multiple access points (arXiv, Anthropic, OpenReview, ICLR) were checked for completeness.