C001¶


Research	R0055 — RLHF Yes-Men Claims
Run	2026-04-01
Claim	C001

Claim: Users demonstrably prefer agreeable AI responses by approximately 50%

BLUF: Substantially correct in direction but imprecise in framing. A 2026 Stanford study published in Science found AI models affirm users 49% more often than humans, and users rated sycophantic AI as more trustworthy. The "approximately 50%" maps to the relative endorsement frequency, not a raw preference rate.

Probability: Likely (55-80%) | Confidence: Medium

Summary¶

Entity	Description
Claim Definition	Claim text, scope, status
Assessment	Full analytical product with reasoning chain
ACH Matrix	Evidence x hypotheses diagnosticity analysis
Self-Audit	ROBIS-adapted 5-domain audit (process + source verification)

Hypotheses¶

ID	Hypothesis	Status
H1	Claim is accurate as stated	Inconclusive
H2	Claim is partially correct or correct with caveats	Supported
H3	Claim is materially wrong	Eliminated

Searches¶

ID	Target	Results	Selected
S01	AI sycophancy user preference studies	10	3

Sources¶

Source	Description	Reliability	Relevance
SRC01	Stanford/Science 2026 sycophancy study	High	High
SRC02	Fortune coverage of Stanford study	Medium	High

Revisit Triggers¶

Replication or refutation of the Stanford/Science 2026 sycophancy study
Publication of a meta-analysis aggregating user preference studies with different metrics