R0055/2026-04-01/C006/SRC01/E01¶
Synthetic data achieved 4.7-10% sycophancy reduction, far less than 84-85% from curated pairs
URL: https://arxiv.org/abs/2308.03958
Extract¶
Wei et al. (Google) fine-tuned FLAN-PaLM models on synthetic data prompts where truthfulness is independent of user opinion. Reductions: FLAN-PaLM-8B: 4.7%, FLAN-PaLM-62B: 8.8%, FLAN-PaLM-62B-cont: 10%. These are meaningful but substantially smaller than the 84-85% achieved by Khan et al. with curated preference pairs. The claim that they produce 'the same' reduction is not supported.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Contradicts | Weak |
| H2 | Supports | Moderate |
| H3 | Supports | Strong |
Context¶
Evidence directly relevant to testing the claim's factual assertions.