Skip to content

R0055/2026-04-01/C001/H1

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C001
Hypothesis H1

Statement

The claim is accurate as stated — users demonstrably prefer agreeable AI responses by approximately 50%.

Status

Current: Inconclusive

Supporting Evidence

Evidence Summary
SRC01-E01 AI models affirm users 49% more often than humans — close to "approximately 50%"
SRC02-E01 Fortune confirms the 49% figure and reports models sided with wrong users 51% of the time

Contradicting Evidence

Evidence Summary
SRC01-E02 The user preference effect (13% more likely to return) is smaller than 50%, suggesting the "50%" measures AI behavior not user preference

Reasoning

The 49% figure measures how much more often AI affirms users compared to humans — it is a behavioral metric of the AI, not a direct measure of user preference magnitude. Users do prefer sycophantic AI, but the preference effect (13% more likely to return) is much smaller than 50%. The claim conflates AI endorsement frequency with user preference.

Relationship to Other Hypotheses

H1 is partially supported because the 49% figure exists, but the framing is imprecise, which favors H2.