Skip to content

R0055/2026-04-01/C019

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C019

Claim: Research shows users prefer sycophantic AI, trust it more, and rate it as higher quality

BLUF: Correct. The Stanford/Science 2026 study found users deemed sycophantic responses more trustworthy and were more likely to return. The Anthropic/ICLR 2024 paper found human preference models prefer sycophantic responses over correct ones. Multiple studies converge on this finding.

Probability: Almost certain (95-99%) | Confidence: High


Summary

Entity Description
Claim Definition Claim text, scope, status
Assessment Full analytical product with reasoning chain
ACH Matrix Evidence x hypotheses diagnosticity analysis
Self-Audit ROBIS-adapted 5-domain audit

Hypotheses

ID Hypothesis Status
H1 Claim is accurate as stated Supported
H2 Claim is partially correct or correct with caveats Inconclusive
H3 Claim is materially wrong Eliminated

Searches

ID Target Results Selected
S01 users prefer sycophantic AI trust higher quality r 10 2

Sources

Source Description Reliability Relevance
SRC01 Stanford/Science 2026 High High

Revisit Triggers

  • Studies finding user segments that actively prefer non-sycophantic AI