R0040/2026-03-28/Q001 — ACH Matrix¶
Matrix¶
| H1: Multiple viable alternatives | H2: No viable alternatives | H3: Modifications not replacements | |
|---|---|---|---|
| SRC01-E01: Three alternatives in production use | ++ | -- | + |
| SRC02-E01: DPO matches/exceeds RLHF at NeurIPS | ++ | -- | + |
| SRC03-E01: CAI deployed for all Claude models since 2022 | ++ | -- | + |
| SRC04-E01: GRPO halves compute, deployed in DeepSeek-R1 | ++ | -- | N/A |
| SRC05-E01: KTO matches DPO with binary signals; HALO framework | + | -- | ++ |
| SRC06-E01: CAI as "enhancement" to RLHF; human feedback as "moat" | + | - | ++ |
| SRC07-E01: ORPO eliminates reference model and alignment phase | ++ | -- | - |
Legend:
- ++ Strongly supports
- + Supports
- -- Strongly contradicts
- - Contradicts
- N/A Not applicable to this hypothesis
Diagnosticity Analysis¶
Most Diagnostic Evidence¶
| Evidence ID | Why Diagnostic |
|---|---|
| SRC05-E01 | The HALO framework simultaneously supports H1 (KTO works) and H3 (all methods are a unified family). Most diagnostic because it discriminates between "revolution" and "evolution" readings. |
| SRC06-E01 | The "enhancement not replacement" characterization and "competitive moat" observation directly inform the H1 vs H3 distinction. |
Least Diagnostic Evidence¶
| Evidence ID | Why Non-Diagnostic |
|---|---|
| SRC01-E01 | Overview source that supports H1 and contradicts H2 but does not discriminate between H1 and H3 |
| SRC04-E01 | GRPO is clearly an alternative but its relationship to the H3 "modification" reading is ambiguous |
Outcome¶
Hypothesis supported: H1 — Multiple viable alternatives exist and are in active use. The evidence unambiguously shows at least six alternatives deployed across multiple labs.
Hypotheses eliminated: H2 — No evidence supports H2. Every source documents at least one viable alternative.
Hypotheses inconclusive: H3 — Partially supported as a qualifier to H1. The HALO framework and DPO's mathematical derivation from RLHF support reading most alternatives as evolutionary. However, ORPO's structural simplification and GRPO+RLVR's elimination of subjective feedback demonstrate genuinely novel departures.