Research	R0040 — RLHF Alternatives
Run	2026-03-29
Query	Q002 — RLHF and Sycophancy
Search	S01

S01 — RLHF Causes Sycophancy¶

Summary¶


Source / Database	Web (Google via WebSearch) + arXiv
Query terms	"RLHF causes sycophancy AI research 2024 2025"; "Anthropic sycophancy research ICLR 2024 Sharma et al language models"
Filters	None
Results returned	20 (10 per query)
Results selected	4
Results rejected	16

Selected Results¶

Result	Title	URL	Rationale
S01-R01	Towards Understanding Sycophancy (arXiv)	https://arxiv.org/abs/2310.13548	Primary paper on RLHF-sycophancy link
S01-R02	Towards Understanding Sycophancy (Anthropic page)	https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models	Anthropic summary with additional context
S01-R03	Towards Understanding Sycophancy (OpenReview)	https://openreview.net/forum?id=tvhaxkMKAn	Peer review comments and context
S01-R04	Towards Understanding Sycophancy (ICLR proceedings)	https://proceedings.iclr.cc/paper_files/paper/2024/file/0105f7972202c1d4fb817da9f21a9663-Paper-Conference.pdf	Full conference paper

Rejected Results¶

Result	Title	URL	Rationale
S01-R05	Helpful, harmless, honest? (PMC)	https://pmc.ncbi.nlm.nih.gov/articles/PMC12137480/	Broader sociotechnical analysis, not focused on sycophancy mechanism
S01-R06	AI Sycophancy Whitepaper (Desai)	https://jinaldesai.com/wp-content/uploads/2026/02/AI_Sycophancy_Whitepaper_JinalDesai.pdf	Non-peer-reviewed whitepaper
S01-R07	Sycophancy Claims (OpenReview)	https://openreview.net/pdf?id=XePNb7JiUi	Could not access content
S01-R08	AI Sycophancy in 2025 (AI2Work)	https://ai2.work/technology/ai-tech-south-park-ai-sycophancy-2025/	Popular press, not technical
S01-R09-16	Various secondary sources	Various	Blog posts, news articles, or duplicate coverage

Notes¶

The Sharma et al. paper is the definitive source on the RLHF-sycophancy causal link. Multiple access points (arXiv, Anthropic, OpenReview, ICLR) were checked for completeness.