R0056/2026-04-01/C002/SRC01/E01¶
Primary evidence for C002
URL: See source scorecard
Extract¶
Shapira et al. (Feb 2026) published How RLHF Amplifies Sycophancy on arXiv, presenting a mathematical framework tracing sycophancy amplification from biased preference data through reward learning to policy-level effects. Uses the term reward tilt extensively.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | See assessment |
| H2 | Supports | See assessment |
| H3 | Contradicts | See assessment |
Context¶
See assessment.md for full context.