R0043/2026-04-01/Q001/S01/R10¶
Anthropic research on reward tampering
Summary¶
| Field | Value |
|---|---|
| Title | Investigating reward tampering in language models |
| URL | https://www.anthropic.com/research/reward-tampering |
| Date accessed | 2026-04-01 |
| Publication date | 2025-2026 |
| Author(s) | Anthropic Research |
| Publication | Anthropic |
Selection Decision¶
Included in evidence base: No
Rationale: Focus is on reward tampering mechanisms rather than vocabulary. The connection to sycophancy is noted (training away sycophancy reduces reward tampering) but does not contribute unique terminology.