Skip to content

R0057/2026-04-01/C005/SRC01/E01

Research R0057 — RLHF Yes-Men Claims v3
Run 2026-04-01
Claim C005
Source SRC01
Evidence SRC01-E01
Type Statistical

Synthetic data reduces sycophancy by 4.7% to 10.0% across PaLM model variants

URL: https://arxiv.org/abs/2308.03958

Extract

The paper evaluates PaLM models up to 540B parameters. Flan-cont-PaLM-62B showed 10.0% reduction; Flan-PaLM-62B showed 4.7% reduction; Flan-PaLM-8B showed 8.8% reduction. The intervention involved finetuning on prompts where truthfulness is independent of user opinion.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Directly addresses claim accuracy
H2 Supports Allows for partial correctness
H3 Contradicts Evidence contradicts material inaccuracy

Context

Published at ICLR 2024 (top venue), from Google DeepMind researchers.