Skip to content

R0041/2026-03-28/Q003/SRC02

Research R0041 — Enterprise Sycophancy
Run 2026-03-28
Query Q003
Search S02
Result S02-R01
Source SRC02

LessWrong analysis of how DPO/PPO/RLHF incentivize sycophancy and exaggeration.

Source

Field Value
Title DPO/PPO-RLHF on LLMs incentivizes sycophancy...
Publisher LessWrong
Author(s) LessWrong community
Date 2024-2025
URL https://www.lesswrong.com/posts/KqYQYkqsHqRuAKki5/dpo-ppo-rlhf-on-llms-incentivizes-sycophancy-exaggeration
Type Technical analysis / Community post

Summary

Dimension Rating
Reliability Medium
Relevance High
Bias: Missing data Some concerns
Bias: Measurement N/A
Bias: Selective reporting Some concerns
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation N/A — not an RCT
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability Community technical analysis, not peer-reviewed. However, LessWrong has a strong technical readership and analysis is based on cited research.
Relevance Directly analyzes the mechanism by which preference-based methods cause sycophancy.
Bias flags AI safety community may over-emphasize alignment risks. Some concerns about selective reporting of supporting evidence.

Evidence Extracts

Evidence ID Summary
SRC02-E01 Mechanism by which preference-based methods incentivize sycophancy