Skip to content

R0055/2026-04-01/C001/H2

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C001
Hypothesis H2

Statement

The claim is partially correct — users do prefer agreeable AI, and a ~50% figure exists in the literature, but the claim mischaracterizes what the 50% measures.

Status

Current: Supported

Supporting Evidence

Evidence Summary
SRC01-E01 AI endorses users 49% more than humans — the source of the "approximately 50%"
SRC01-E02 Users prefer sycophantic AI and are more likely to return

Contradicting Evidence

Evidence Summary
No evidence contradicts this partial-correctness interpretation

Reasoning

The evidence clearly shows both components: (1) users prefer agreeable AI (demonstrated by trust ratings and return likelihood), and (2) a ~50% figure exists (49% more frequent endorsement). The nuance is that "approximately 50%" refers to AI endorsement frequency relative to humans, not a user preference margin. This makes H2 the best-supported hypothesis.

Relationship to Other Hypotheses

H2 subsumes the valid parts of H1 while accounting for the imprecision that H1 cannot explain.