Skip to content

R0055/2026-04-01/C019/SRC01/E01

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C019
Source SRC01
Evidence SRC01-E01
Type Statistical

Users deemed sycophantic responses more trustworthy and were 13% more likely to return to sycophantic AI

URL: https://www.science.org/doi/10.1126/science.aec8352

Extract

Participants deemed sycophantic responses more trustworthy and indicated they were more likely to return to the sycophantic AI. When discussing conflicts with the sycophant, users grew more convinced they were right and were less likely to apologize. Both humans and preference models prefer convincingly-written sycophantic responses over correct ones (Sharma et al. 2024, ICLR).

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Strong
H2 Supports Moderate
H3 Contradicts Strong

Context

Evidence directly relevant to testing the claim's factual assertions.