Skip to content
Research R0040 — RLHF Alternatives
Run 2026-03-29
Query Q002 — RLHF and Sycophancy
Search S05
Result S05-R01

S05-R01 — Reward Hacking in Reinforcement Learning

Summary

Title Reward Hacking in Reinforcement Learning
URL https://lilianweng.github.io/posts/2024-11-28-reward-hacking/
Date accessed 2026-03-29
Publication date November 28, 2024
Authors Lilian Weng
Publication Lil'Log (personal blog)

Selection Decision

Selected as a comprehensive technical survey by OpenAI VP of Research. Establishes the oracle/human/proxy reward framework and identifies sycophancy as a reward hacking manifestation.