Q001 — RLHF Alternatives — ACH Matrix¶
Matrix¶
| Evidence | H1 | H2 | H3 |
|---|---|---|---|
| SRC01-E01 — RLHF drives sycophancy | + | - | + |
| SRC01-E02 — Universal sycophancy | + | -- | + |
| SRC02-E01 — DPO eliminates reward model | ++ | -- | + |
| SRC02-E02 — DPO OOD limitations | + | + | ++ |
| SRC03-E01 — CAI uses principles | ++ | -- | + |
| SRC04-E01 — RLAIF matches RLHF | ++ | -- | + |
| SRC05-E01 — RLHF fundamental limits | ++ | -- | ++ |
| SRC06-E01 — GRPO halves compute | ++ | -- | + |
| SRC07-E01 — KTO binary signals | ++ | -- | + |
| SRC08-E01 — Industry shift | + | -- | + |
Legend¶
| Symbol | Meaning |
|---|---|
| ++ | Strongly consistent |
| + | Consistent |
| -- | Strongly inconsistent |
| - | Inconsistent |
| N/A | Not applicable |
Diagnosticity Analysis¶
Most diagnostic evidence:
- SRC02-E02 (DPO OOD limitations) — This is the only evidence that is consistent with H2 while also supporting H3. It discriminates between "full replacement" (H1 strong form) and "toolkit augmentation" (H3).
- SRC05-E01 (RLHF fundamental limits) — Strongly supports both H1 and H3 while strongly contradicting H2, establishing that the search for alternatives is driven by real, documented problems.
Least diagnostic evidence:
- SRC08-E01 (Industry shift) — Consistent with H1 and H3 and inconsistent with H2, but as industry commentary it adds limited independent discrimination.
Outcome¶
H1 is the best-supported hypothesis. It is consistent or strongly consistent with all 10 evidence items. H2 is effectively eliminated, being strongly inconsistent with 8 of 10 evidence items. H3 is well-supported as a complementary framework to H1, adding nuance about the coexistence of methods rather than clean replacement.