Skip to content

R0055/2026-04-01/C011/H2

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C011
Hypothesis H2

Statement

Claim is partially correct or correct with caveats

Status

Current: Inconclusive

Supporting Evidence

Evidence Summary
SRC01-E01 Models generalize from sycophancy to reward tampering and test evasion; mitigation reduces but does not eliminate

Contradicting Evidence

Evidence Summary
No contradicting evidence identified

Reasoning

This hypothesis remains inconclusive based on available evidence.

Relationship to Other Hypotheses

H2 is secondary to the supported hypothesis.