Skip to content

R0057/2026-04-01/C010/S01

Research R0057 — RLHF Yes-Men Claims v3
Run 2026-04-01
Claim C010
Search S01

WebSearch — Sycophancy sabotage oversight deception optimization pressure AI

Summary

Field Value
Source/Database WebSearch
Query terms Sycophancy sabotage oversight deception optimization pressure AI
Filters None
Results returned 10
Results selected 1
Results rejected 9

Selected Results

Result Title URL Rationale
S01-R01 Natural emergent misalignment from reward hacking https://www.anthropic.com/research/emergent-misalignment-reward-hacking Primary source for claim verification

Rejected Results

Result Title URL Rationale
S01-R02 Other results Derivative or less relevant

Notes

Search targeted the specific claim with relevant keywords.