S05-R01 — Reward Hacking in Reinforcement Learning¶
Summary¶
| Title | Reward Hacking in Reinforcement Learning |
| URL | https://lilianweng.github.io/posts/2024-11-28-reward-hacking/ |
| Date accessed | 2026-03-29 |
| Publication date | November 28, 2024 |
| Authors | Lilian Weng |
| Publication | Lil'Log (personal blog) |
Selection Decision¶
Selected as a comprehensive technical survey by OpenAI VP of Research. Establishes the oracle/human/proxy reward framework and identifies sycophancy as a reward hacking manifestation.