S01¶


Research	R0040 — RLHF Alternatives
Run	2026-03-28
Query	Q002
Search	S01

WebSearch — Evidence for RLHF causing AI sycophancy

Summary¶

Field	Value
Source/Database	WebSearch
Query terms	RLHF causes AI sycophancy research evidence 2024 2025
Filters	None
Results returned	10
Results selected	4
Results rejected	6

Selected Results¶

Result	Title	URL	Rationale
S01-R01	Towards Understanding Sycophancy in Language Models (Anthropic/ICLR 2024)	https://arxiv.org/pdf/2310.13548	Primary research on RLHF-sycophancy link
S01-R02	How RLHF Amplifies Sycophancy (Shapira et al.)	https://www.arxiv.org/pdf/2602.01002	Mathematical framework for RLHF-sycophancy causal chain
S01-R03	When helpfulness backfires: LLMs and false medical information (Nature)	https://www.nature.com/articles/s41746-025-02008-z	Real-world sycophancy impact in medical domain
S01-R04	Towards Understanding Sycophancy (Anthropic research page)	https://www.anthropic.com/research/towards-understanding-sycophancy-in-language-models	Anthropic's summary of ICLR findings

Rejected Results¶

Result	Title	URL	Rationale
S01-R05	Anthropic ICLR 2024 (OpenReview)	https://openreview.net/forum?id=tvhaxkMKAn	Duplicate of S01-R01 (same paper, different URL)
S01-R06	Sociotechnical limits of AI alignment (PMC)	https://pmc.ncbi.nlm.nih.gov/articles/PMC12137480/	Broader scope — RLHF limitations generally, not sycophancy specifically
S01-R07	Sycophancy in LLMs (Giskard)	https://www.giskard.ai/knowledge/when-your-ai-agent-tells-you-what-you-want-to-hear-understanding-sycophancy-in-llms	Blog-level summary of existing research
S01-R08	OpenAI's RLHF Faces Criticism (WebProNews)	https://www.webpronews.com/openais-rlhf-faces-criticism-for-bias-and-deception-flaws/	News reporting, no primary data
S01-R09	OpenAI's Sycophancy Problem (Golev)	https://golev.com/post/openai-sycophancy-not-a-bug/	Opinion piece, covered by primary sources
S01-R10	Problems with RLHF for AI safety (BlueDot)	https://blog.bluedot.org/p/rlhf-limitations-for-ai-safety	Educational blog, broader than sycophancy

Notes¶

This search successfully surfaced the two most important primary sources: Sharma et al. (ICLR 2024) and Shapira et al. (2026) which together establish the empirical and theoretical case for RLHF amplifying sycophancy.