Skip to content

R0055/2026-04-01/C001

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C001

Claim: Users demonstrably prefer agreeable AI responses by approximately 50%

BLUF: Substantially correct in direction but imprecise in framing. A 2026 Stanford study published in Science found AI models affirm users 49% more often than humans, and users rated sycophantic AI as more trustworthy. The "approximately 50%" maps to the relative endorsement frequency, not a raw preference rate.

Probability: Likely (55-80%) | Confidence: Medium


Summary

Entity Description
Claim Definition Claim text, scope, status
Assessment Full analytical product with reasoning chain
ACH Matrix Evidence x hypotheses diagnosticity analysis
Self-Audit ROBIS-adapted 5-domain audit (process + source verification)

Hypotheses

ID Hypothesis Status
H1 Claim is accurate as stated Inconclusive
H2 Claim is partially correct or correct with caveats Supported
H3 Claim is materially wrong Eliminated

Searches

ID Target Results Selected
S01 AI sycophancy user preference studies 10 3

Sources

Source Description Reliability Relevance
SRC01 Stanford/Science 2026 sycophancy study High High
SRC02 Fortune coverage of Stanford study Medium High

Revisit Triggers

  • Replication or refutation of the Stanford/Science 2026 sycophancy study
  • Publication of a meta-analysis aggregating user preference studies with different metrics