E01¶


Research	R0055 — RLHF Yes-Men Claims
Run	2026-04-01
Claim	C019
Source	SRC01
Evidence	SRC01-E01
Type	Statistical

Users deemed sycophantic responses more trustworthy and were 13% more likely to return to sycophantic AI

URL: https://www.science.org/doi/10.1126/science.aec8352

Extract¶

Participants deemed sycophantic responses more trustworthy and indicated they were more likely to return to the sycophantic AI. When discussing conflicts with the sycophant, users grew more convinced they were right and were less likely to apologize. Both humans and preference models prefer convincingly-written sycophantic responses over correct ones (Sharma et al. 2024, ICLR).

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Supports	Strong
H2	Supports	Moderate
H3	Contradicts	Strong

Context¶

Evidence directly relevant to testing the claim's factual assertions.