R0021/2026-03-25/Q004 — Assessment¶
BLUF¶
All three regulated industries (aviation, healthcare, financial services) have established validation frameworks for AI systems, but all acknowledge that these frameworks were designed for traditional systems and require adaptation. The FAA released its AI Safety Assurance Roadmap v1 in 2024, acknowledging that "rigorous safety assurance methods must be developed." The FDA requires 10 Good Machine Learning Practice principles and predetermined change control plans. Banking regulators apply SR 11-7 model risk management but recognize its limitations for self-adapting AI models. NIST provides a voluntary risk management framework with four core functions.
Probability¶
Rating: Almost certain (95-99%) that frameworks exist; Very likely (80-95%) that they have significant gaps for modern AI.
Confidence in assessment: High
Confidence rationale: Evidence from primary regulatory sources (FAA, FDA, Federal Reserve). These are published government documents.
Reasoning Chain¶
- FAA released AI Safety Assurance Roadmap v1 (July 2024) acknowledging need for new methods specific to AI [SRC01-E01, High reliability, High relevance]
- FDA requires GMLP 10 principles + predetermined change control plans for AI medical devices; 97% cleared via 510(k) pathway [SRC02-E01, High reliability, High relevance]
- SR 11-7 (2011) requires 4-perspective model validation but was designed for stable models; "may lose effectiveness" for self-adapting AI [SRC03-E01, High reliability, High relevance]
- NIST AI RMF provides voluntary 4-function framework (GOVERN, MAP, MEASURE, MANAGE) with TEVV practices [SRC04-E01, High reliability, High relevance]
- JUDGMENT: The pattern across all three industries is the same — existing frameworks being stretched to cover AI, with acknowledged gaps. This is the opposite of the prompt engineering situation, where no formal framework exists at all.
Evidence Base Summary¶
| Source | Description | Reliability | Relevance | Key Finding |
|---|---|---|---|---|
| SRC01 | FAA AI Roadmap | High | High | "Rigorous methods must be developed" |
| SRC02 | FDA SaMD Guidance | High | High | GMLP + change control plans |
| SRC03 | SR 11-7 | High | High | Traditional validation struggling with adaptive AI |
| SRC04 | NIST AI RMF | High | High | Voluntary framework with TEVV |
Collection Synthesis¶
| Dimension | Assessment |
|---|---|
| Evidence quality | Robust — all primary regulatory sources |
| Source agreement | High — all agree frameworks exist but need adaptation |
| Source independence | Independent — four separate regulatory bodies |
| Outliers | None |
Detail¶
The contrast with prompt engineering is stark. These industries have validation frameworks, testing requirements, regulatory oversight, and enforcement mechanisms for AI — even though they acknowledge the frameworks are imperfect. Prompt engineering has none of these.
Gaps¶
| Missing Evidence | Impact on Assessment |
|---|---|
| EU AI Act implementation details | Minor — would add European perspective |
| Specific test case requirements | Moderate — would strengthen the "how" of validation |
| Industry compliance rates | Moderate — would show enforcement effectiveness |
Researcher Bias Check¶
Declared biases: The researcher may emphasize the existence of these frameworks to contrast with prompt engineering's lack of frameworks.
Influence assessment: The frameworks genuinely exist and are documented. The contrast is factual.
Cross-References¶
| Entity | ID | File |
|---|---|---|
| Hypotheses | H1, H2, H3 | hypotheses/ |
| Sources | SRC01-SRC04 | sources/ |
| ACH Matrix | — | ach-matrix.md |
| Self-Audit | — | self-audit.md |