Skip to content

R0057/2026-04-01/C005/H1

Research R0057 — RLHF Yes-Men Claims v3
Run 2026-04-01
Claim C005
Hypothesis H1

Statement

The 4.7-10% range is accurate

Status

Current: Supported

Supporting Evidence

Evidence Summary
SRC01-E01 Synthetic data reduces sycophancy by 4.7% to 10.0% across PaLM model variants

Contradicting Evidence

Evidence Summary
No contradicting evidence found

Reasoning

The paper evaluates PaLM models up to 540B parameters. Flan-cont-PaLM-62B showed 10.0% reduction; Flan-PaLM-62B showed 4.7% reduction; Flan-PaLM-8B showed 8.8% reduction. The intervention involved finetuning on prompts where truthfulness is independent of user opinion.

Relationship to Other Hypotheses

H1 represents full accuracy. H2 allows for partial correctness. H3 is eliminated by the evidence.