Skip to content

S01 — RLHF Causes Sycophancy

Summary

Source / Database Web (Google via WebSearch) + arXiv
Query terms "RLHF causes sycophancy AI research 2024 2025"; "Anthropic sycophancy research ICLR 2024 Sharma et al language models"
Filters None
Results returned 20 (10 per query)
Results selected 4
Results rejected 16

Selected Results

Result Title URL Rationale
S01-R01 Towards Understanding Sycophancy (arXiv) https://arxiv.org/abs/2310.13548 Primary paper on RLHF-sycophancy link
S01-R02 Towards Understanding Sycophancy (Anthropic page) https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models Anthropic summary with additional context
S01-R03 Towards Understanding Sycophancy (OpenReview) https://openreview.net/forum?id=tvhaxkMKAn Peer review comments and context
S01-R04 Towards Understanding Sycophancy (ICLR proceedings) https://proceedings.iclr.cc/paper_files/paper/2024/file/0105f7972202c1d4fb817da9f21a9663-Paper-Conference.pdf Full conference paper

Rejected Results

Result Title URL Rationale
S01-R05 Helpful, harmless, honest? (PMC) https://pmc.ncbi.nlm.nih.gov/articles/PMC12137480/ Broader sociotechnical analysis, not focused on sycophancy mechanism
S01-R06 AI Sycophancy Whitepaper (Desai) https://jinaldesai.com/wp-content/uploads/2026/02/AI_Sycophancy_Whitepaper_JinalDesai.pdf Non-peer-reviewed whitepaper
S01-R07 Sycophancy Claims (OpenReview) https://openreview.net/pdf?id=XePNb7JiUi Could not access content
S01-R08 AI Sycophancy in 2025 (AI2Work) https://ai2.work/technology/ai-tech-south-park-ai-sycophancy-2025/ Popular press, not technical
S01-R09-16 Various secondary sources Various Blog posts, news articles, or duplicate coverage

Notes

The Sharma et al. paper is the definitive source on the RLHF-sycophancy causal link. Multiple access points (arXiv, Anthropic, OpenReview, ICLR) were checked for completeness.