Skip to content

Q001 — RLHF Alternatives — ACH Matrix

Matrix

Evidence H1 H2 H3
SRC01-E01 — RLHF drives sycophancy + - +
SRC01-E02 — Universal sycophancy + -- +
SRC02-E01 — DPO eliminates reward model ++ -- +
SRC02-E02 — DPO OOD limitations + + ++
SRC03-E01 — CAI uses principles ++ -- +
SRC04-E01 — RLAIF matches RLHF ++ -- +
SRC05-E01 — RLHF fundamental limits ++ -- ++
SRC06-E01 — GRPO halves compute ++ -- +
SRC07-E01 — KTO binary signals ++ -- +
SRC08-E01 — Industry shift + -- +

Legend

Symbol Meaning
++ Strongly consistent
+ Consistent
-- Strongly inconsistent
- Inconsistent
N/A Not applicable

Diagnosticity Analysis

Most diagnostic evidence:

  • SRC02-E02 (DPO OOD limitations) — This is the only evidence that is consistent with H2 while also supporting H3. It discriminates between "full replacement" (H1 strong form) and "toolkit augmentation" (H3).
  • SRC05-E01 (RLHF fundamental limits) — Strongly supports both H1 and H3 while strongly contradicting H2, establishing that the search for alternatives is driven by real, documented problems.

Least diagnostic evidence:

  • SRC08-E01 (Industry shift) — Consistent with H1 and H3 and inconsistent with H2, but as industry commentary it adds limited independent discrimination.

Outcome

H1 is the best-supported hypothesis. It is consistent or strongly consistent with all 10 evidence items. H2 is effectively eliminated, being strongly inconsistent with 8 of 10 evidence items. H3 is well-supported as a complementary framework to H1, adding nuance about the coexistence of methods rather than clean replacement.