R0040/2026-03-28/Q001/S03/R03¶
Original ORPO paper on monolithic preference optimization.
Summary¶
| Field | Value |
|---|---|
| Title | ORPO: Monolithic Preference Optimization without Reference Model |
| URL | https://arxiv.org/abs/2403.07691 |
| Date accessed | 2026-03-28 |
| Publication date | 2024-03-12 |
| Author(s) | Jiwoo Hong, Noah Lee, James Thorne |
| Publication | arXiv |
Selection Decision¶
Included in evidence base: Yes
Rationale: Primary source for ORPO. Represents the furthest simplification of the RLHF pipeline — eliminates both the reference model and the separate alignment phase.