R0021/2026-03-25/Q003 — Assessment¶
BLUF¶
Across all four major AI vendors, the overwhelming majority of prompt engineering recommendations are subjective and qualitative. Of approximately 25 distinct recommendations analyzed, only 3-4 include quantifiable criteria (Google's temperature=1.0, ~21-word prompt average, and accuracy claims). Microsoft explicitly describes prompting as "more of an art than a science." No vendor provides measurable success criteria, testable thresholds, or reproducible specifications for their recommendations.
Probability¶
Rating: Almost certain (95-99%) that vendor guidance is predominantly subjective.
Confidence in assessment: High
Confidence rationale: Based on direct analysis of current official vendor documentation. The findings are verifiable by reading the source documents.
Reasoning Chain¶
- OpenAI provides 6 strategies, all qualitative: "write clear instructions," "provide reference text," etc. [SRC01-E01, High reliability, High relevance]
- Anthropic provides 7 recommendations, 1 semi-quantifiable (20k token threshold); remainder structural/qualitative [SRC02-E01, High reliability, High relevance]
- Google provides the most quantifiable guidance: temperature=1.0, ~21-word average, accuracy claims (3 quantifiable out of ~7) [SRC03-E01, High reliability, High relevance]
- Microsoft provides 5 recommendations with 0 quantifiable elements and explicitly states prompting is "more art than science" [SRC04-E01, High reliability, High relevance]
- JUDGMENT: Across ~25 recommendations, approximately 85-90% are subjective. The few quantifiable elements (temperature setting, token threshold) are operational parameters, not engineering specifications.
Evidence Base Summary¶
| Source | Description | Reliability | Relevance | Key Finding |
|---|---|---|---|---|
| SRC01 | OpenAI Guide | High | High | 6 strategies, 0 quantifiable |
| SRC02 | Anthropic Guide | High | High | 7 recommendations, 1 semi-quantifiable |
| SRC03 | Google Guide | High | High | 7 recommendations, 3 quantifiable |
| SRC04 | Microsoft Guide | High | High | 5 recommendations, 0 quantifiable; "art not science" |
Collection Synthesis¶
| Dimension | Assessment |
|---|---|
| Evidence quality | Robust — all four major vendors' official documentation analyzed |
| Source agreement | High — all vendors provide predominantly qualitative guidance |
| Source independence | Independent — each vendor publishes its own documentation |
| Outliers | Google is a mild outlier with more quantifiable recommendations than others |
Detail¶
The consistency across all four vendors is itself significant. These are the organizations that coined and popularized "prompt engineering," yet none of them provide engineering-grade specifications. The guidance reads more like cooking tips ("add more detail," "be specific," "use examples") than engineering requirements ("the signal-to-noise ratio must exceed 20dB," "the load must not exceed 500kg").
Gaps¶
| Missing Evidence | Impact on Assessment |
|---|---|
| Internal vendor testing data | Moderate — vendors likely have quantifiable internal benchmarks they do not publish |
| Academic prompt engineering research | Minor — academic work may provide more measurable criteria |
| Vendor cookbooks and tutorials | Minor — supplementary materials may contain more specific guidance |
Researcher Bias Check¶
Declared biases: The researcher argues that "prompt engineering" is not engineering. Finding predominantly subjective vendor guidance supports this argument.
Influence assessment: The finding is based on direct analysis of source documents. However, the classification of recommendations as "quantifiable" vs. "subjective" involves judgment. Another researcher might classify structural recommendations (like "use XML tags") as measurable, though they lack numerical thresholds.
Cross-References¶
| Entity | ID | File |
|---|---|---|
| Hypotheses | H1, H2, H3 | hypotheses/ |
| Sources | SRC01, SRC02, SRC03, SRC04 | sources/ |
| ACH Matrix | — | ach-matrix.md |
| Self-Audit | — | self-audit.md |