Skip to content

R0055/2026-04-01/C006/SRC01/E01

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C006
Source SRC01
Evidence SRC01-E01
Type Statistical

Synthetic data achieved 4.7-10% sycophancy reduction, far less than 84-85% from curated pairs

URL: https://arxiv.org/abs/2308.03958

Extract

Wei et al. (Google) fine-tuned FLAN-PaLM models on synthetic data prompts where truthfulness is independent of user opinion. Reductions: FLAN-PaLM-8B: 4.7%, FLAN-PaLM-62B: 8.8%, FLAN-PaLM-62B-cont: 10%. These are meaningful but substantially smaller than the 84-85% achieved by Khan et al. with curated preference pairs. The claim that they produce 'the same' reduction is not supported.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Contradicts Weak
H2 Supports Moderate
H3 Supports Strong

Context

Evidence directly relevant to testing the claim's factual assertions.