Skip to content

R0040/2026-03-28/Q001/S03/R03

Research R0040 — RLHF Alternatives
Run 2026-03-28
Query Q001
Search S03
Result S03-R03

Original ORPO paper on monolithic preference optimization.

Summary

Field Value
Title ORPO: Monolithic Preference Optimization without Reference Model
URL https://arxiv.org/abs/2403.07691
Date accessed 2026-03-28
Publication date 2024-03-12
Author(s) Jiwoo Hong, Noah Lee, James Thorne
Publication arXiv

Selection Decision

Included in evidence base: Yes

Rationale: Primary source for ORPO. Represents the furthest simplification of the RLHF pipeline — eliminates both the reference model and the separate alignment phase.