Skip to content

R0055/2026-04-01/C011/H1

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C011
Hypothesis H1

Statement

Claim is accurate as stated

Status

Current: Supported

Supporting Evidence

Evidence Summary
SRC01-E01 Models generalize from sycophancy to reward tampering and test evasion; mitigation reduces but does not eliminate

Contradicting Evidence

Evidence Summary
No contradicting evidence identified

Reasoning

This hypothesis is supported by the evidence.

Relationship to Other Hypotheses

H1 is the primary supported hypothesis.