E01¶


Research	R0055 — RLHF Yes-Men Claims
Run	2026-04-01
Claim	C006
Source	SRC01
Evidence	SRC01-E01
Type	Statistical

Synthetic data achieved 4.7-10% sycophancy reduction, far less than 84-85% from curated pairs

URL: https://arxiv.org/abs/2308.03958

Extract¶

Wei et al. (Google) fine-tuned FLAN-PaLM models on synthetic data prompts where truthfulness is independent of user opinion. Reductions: FLAN-PaLM-8B: 4.7%, FLAN-PaLM-62B: 8.8%, FLAN-PaLM-62B-cont: 10%. These are meaningful but substantially smaller than the 84-85% achieved by Khan et al. with curated preference pairs. The claim that they produce 'the same' reduction is not supported.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	Contradicts	Weak
H2	Supports	Moderate
H3	Supports	Strong

Context¶

Evidence directly relevant to testing the claim's factual assertions.