SRC05 — Sycophancy Hides Linearly in the Attention Heads¶
Source¶
| Title | Sycophancy Hides Linearly in the Attention Heads |
| Publisher | arXiv |
| Authors | Rifo Genadi, Munachiso Nwadike, Nurdaulet Mukhituly, Hilal Alquabeh, Tatsuya Hiraoka, Kentaro Inui |
| Date | January 2026 |
| URL | https://arxiv.org/abs/2601.16644 |
| Type | Pre-print |
Summary Ratings¶
| Dimension | Rating |
|---|---|
| Reliability | Medium |
| Relevance | High |
| Missing data bias | Low |
| Measurement bias | Low |
| Selective reporting bias | Low |
| Randomization bias | N/A |
| Protocol deviation bias | Low |
| COI / Funding bias | Low |
Rationale¶
| Dimension | Rationale |
|---|---|
| Reliability | Recent pre-print, not yet peer-reviewed; but builds on established interpretability methods |
| Relevance | Provides mechanistic understanding of where sycophancy lives in model internals |
Evidence Extracts¶
| Evidence | Summary |
|---|---|
| SRC05-E01 | Sycophancy is linearly separable in attention heads and distinct from truthfulness directions |