S02-R03 — RLHF without RL¶
Summary¶
| Title | RLHF without RL - Direct Preference Optimization |
| URL | https://iclr-blogposts.github.io/2024/blog/rlhf-without-rl/ |
| Date accessed | 2026-03-29 |
| Publication date | 2024 |
| Authors | ICLR 2024 Blogposts |
| Publication | ICLR Blogposts 2024 |
Selection Decision¶
Selected for academic analysis of how DPO reformulates the RLHF problem without RL.