E01¶


Research	R0040 — RLHF Alternatives
Run	2026-03-28
Query	Q002
Source	SRC06
Evidence	SRC06-E01
Type	Factual

Synthetic data fine-tuning reduces sycophancy without changing training algorithm.

URL: https://arxiv.org/abs/2308.03958

Extract¶

Wei et al. demonstrate that fine-tuning on carefully constructed synthetic datasets can significantly reduce sycophantic tendencies in LLMs. The approach generates examples where the model should maintain its position when challenged by the user, rather than capitulating.

Key findings: - Synthetic data augmentation with non-sycophantic examples reduces sycophancy across multiple evaluation benchmarks - The method works as a fine-tuning intervention and does not require changing the underlying RLHF pipeline - The approach is compatible with existing training infrastructure

This establishes that sycophancy can be addressed at the data level, suggesting the problem lies partially in the training data composition rather than solely in the training algorithm.

Relevance to Hypotheses¶

Hypothesis	Relationship	Strength
H1	N/A	This is a data-level fix, not a move away from RLHF
H2	Contradicts	Shows sycophancy is addressable, confirming it is a recognized problem
H3	Supports	Demonstrates that non-RLHF interventions (data curation) can reduce sycophancy

Context¶

Together with Khan et al. (SRC05), this paper establishes that the DATA is the critical variable in sycophancy reduction. Whether you use RLHF, DPO, or any other preference-based method, the quality and composition of the training data determines whether the resulting model will be sycophantic.