R0040/2026-04-01/Q002/SRC07
Sparse Activation Fusion (SAF) for sycophancy mitigation
Source
| Field |
Value |
| Title |
Mitigating Sycophancy in Language Models via Sparse Activation Fusion |
| Publisher |
OpenReview |
| Author(s) |
Unknown (from search results) |
| Date |
2025 (estimated) |
| URL |
https://openreview.net/pdf?id=BCS7HHInC2 |
| Type |
Research paper |
Summary
| Dimension |
Rating |
| Reliability |
Medium |
| Relevance |
High |
| Bias: Missing data |
Some concerns |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Some concerns |
| Bias: Randomization |
N/A -- not an RCT |
| Bias: Protocol deviation |
N/A -- not an RCT |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
OpenReview paper. Full text was not accessible (403 error), so details come from search result summaries. |
| Relevance |
Directly demonstrates a mechanistic approach to sycophancy mitigation. |
| Bias flags |
Limited access to full paper reduces confidence. Reported metrics are striking (63% to 39%) and may need verification. |
| Evidence ID |
Summary |
| SRC07-E01 |
SAF reduces sycophancy from 63% to 39% via inference-time activation intervention |