R0041/2026-03-28/Q003/SRC02
LessWrong analysis of how DPO/PPO/RLHF incentivize sycophancy and exaggeration.
Source
| Field |
Value |
| Title |
DPO/PPO-RLHF on LLMs incentivizes sycophancy... |
| Publisher |
LessWrong |
| Author(s) |
LessWrong community |
| Date |
2024-2025 |
| URL |
https://www.lesswrong.com/posts/KqYQYkqsHqRuAKki5/dpo-ppo-rlhf-on-llms-incentivizes-sycophancy-exaggeration |
| Type |
Technical analysis / Community post |
Summary
| Dimension |
Rating |
| Reliability |
Medium |
| Relevance |
High |
| Bias: Missing data |
Some concerns |
| Bias: Measurement |
N/A |
| Bias: Selective reporting |
Some concerns |
| Bias: Randomization |
N/A — not an RCT |
| Bias: Protocol deviation |
N/A — not an RCT |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
Community technical analysis, not peer-reviewed. However, LessWrong has a strong technical readership and analysis is based on cited research. |
| Relevance |
Directly analyzes the mechanism by which preference-based methods cause sycophancy. |
| Bias flags |
AI safety community may over-emphasize alignment risks. Some concerns about selective reporting of supporting evidence. |
| Evidence ID |
Summary |
| SRC02-E01 |
Mechanism by which preference-based methods incentivize sycophancy |