SRC02¶

LessWrong analysis of how DPO/PPO/RLHF incentivize sycophancy and exaggeration.

Source¶

Field	Value
Title	DPO/PPO-RLHF on LLMs incentivizes sycophancy...
Publisher	LessWrong
Author(s)	LessWrong community
Date	2024-2025
URL	https://www.lesswrong.com/posts/KqYQYkqsHqRuAKki5/dpo-ppo-rlhf-on-llms-incentivizes-sycophancy-exaggeration
Type	Technical analysis / Community post

Dimension	Rationale
Reliability	Community technical analysis, not peer-reviewed. However, LessWrong has a strong technical readership and analysis is based on cited research.
Relevance	Directly analyzes the mechanism by which preference-based methods cause sycophancy.
Bias flags	AI safety community may over-emphasize alignment risks. Some concerns about selective reporting of supporting evidence.

Evidence ID	Summary
SRC02-E01	Mechanism by which preference-based methods incentivize sycophancy