Skip to content

R0024/2026-03-25/Q004 — ACH Matrix

Matrix

H1: Metrics and commitments exist H2: No meaningful metrics H3: Limited and inconsistent
SRC01-E01: Anthropic 70-85% reduction, Petri tool ++ -- +
SRC02-E01: OpenAI post-mortem, opaque methodology + - ++
SRC03-E01: Newer models paradoxically worse; industry complacency - + ++
SRC04-E01: 42 AGs demanding commitments - N/A ++

Legend: - ++ Strongly supports - + Supports - -- Strongly contradicts - - Contradicts - N/A Not applicable to this hypothesis

Diagnosticity Analysis

Most Diagnostic Evidence

Evidence ID Why Diagnostic
SRC04-E01 The 42-state AG letter is most diagnostic because it discriminates sharply between H1 (strong commitments exist) and H3 (commitments are insufficient). If satisfactory commitments existed, 42 AGs would not have demanded them.
SRC03-E01 The finding that newer models are more sycophantic contradicts H1 (commitments are working) and supports H3 (efforts are inconsistent).

Least Diagnostic Evidence

Evidence ID Why Non-Diagnostic
SRC01-E01 Anthropic's metrics support both H1 (metrics exist) and H3 (limited to one company). Does not discriminate clearly between hypotheses.

Outcome

Hypothesis supported: H3 — Some metrics exist but commitments are limited, inconsistent across companies, and lack standardization, binding mechanisms, and independent verification.

Hypotheses eliminated: H2 — Anthropic's published metrics clearly demonstrate that some data exists.

Hypotheses inconclusive: H1 — Partially supported (Anthropic's metrics are real) but overstates the breadth and binding nature of industry commitments.