Skip to content

R0056/2026-04-01/C001/H2

Research R0056 — RLHF Yes-Men Claims v2
Run 2026-04-01
Claim C001
Hypothesis H2

Statement

The claim is partially correct or correct with caveats: the 49% figure is accurate but may not represent all contexts or model types.

Status

Current: Inconclusive

Supporting Evidence

Evidence Summary
SRC01-E01 Individual models varied — DeepSeek V3 showed 55% more, some models showed less

Contradicting Evidence

Evidence Summary
SRC01-E01 The 49% is the stated average, and the claim uses "approximately"

Reasoning

The 49% figure is an average across 11 models. Individual models varied significantly (DeepSeek V3 showed 55% more, while some were lower). However, the claim states "approximately 49%" which accurately represents the study's average finding.

Relationship to Other Hypotheses

H2 is technically valid — there are caveats about model-level variation — but the claim's use of "approximately" makes H1 the stronger match.