SRC07¶

Original ORPO paper on monolithic preference optimization.

Source¶

Field	Value
Title	ORPO: Monolithic Preference Optimization without Reference Model
Publisher	arXiv
Author(s)	Jiwoo Hong, Noah Lee, James Thorne
Date	2024-03-12
URL	https://arxiv.org/abs/2403.07691
Type	Research paper

Dimension	Rationale
Reliability	Preprint from KAIST researchers. Method has received attention but less extensive external validation than DPO or GRPO.
Relevance	Represents the furthest simplification of the RLHF pipeline — eliminates both reference model and separate alignment phase.
Bias flags	Missing data: limited evaluation on diverse tasks. Selective reporting: comparisons focus on benchmarks where ORPO performs well.

Evidence ID	Summary
SRC07-E01	ORPO eliminates both reference model and separate alignment phase