R0040/2026-03-28/Q001/SRC07
Original ORPO paper on monolithic preference optimization.
Source
| Field |
Value |
| Title |
ORPO: Monolithic Preference Optimization without Reference Model |
| Publisher |
arXiv |
| Author(s) |
Jiwoo Hong, Noah Lee, James Thorne |
| Date |
2024-03-12 |
| URL |
https://arxiv.org/abs/2403.07691 |
| Type |
Research paper |
Summary
| Dimension |
Rating |
| Reliability |
Medium-High |
| Relevance |
Medium-High |
| Bias: Missing data |
Some concerns |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Some concerns |
| Bias: Randomization |
N/A |
| Bias: Protocol deviation |
N/A |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
Preprint from KAIST researchers. Method has received attention but less extensive external validation than DPO or GRPO. |
| Relevance |
Represents the furthest simplification of the RLHF pipeline — eliminates both reference model and separate alignment phase. |
| Bias flags |
Missing data: limited evaluation on diverse tasks. Selective reporting: comparisons focus on benchmarks where ORPO performs well. |
| Evidence ID |
Summary |
| SRC07-E01 |
ORPO eliminates both reference model and separate alignment phase |