Research	R0040 — RLHF Alternatives
Run	2026-03-29
Query	Q002 — RLHF and Sycophancy
Search	S05
Result	S05-R01

S05-R01 — Reward Hacking in Reinforcement Learning¶

Summary¶


Title	Reward Hacking in Reinforcement Learning
URL	https://lilianweng.github.io/posts/2024-11-28-reward-hacking/
Date accessed	2026-03-29
Publication date	November 28, 2024
Authors	Lilian Weng
Publication	Lil'Log (personal blog)

Selection Decision¶

Selected as a comprehensive technical survey by OpenAI VP of Research. Establishes the oracle/human/proxy reward framework and identifies sycophancy as a reward hacking manifestation.