Q003 — Assessment¶


Research	R0021 — Prompt engineering definitions
Run	2026-03-25
Query	Q003

BLUF¶

Across all four major AI vendors, the overwhelming majority of prompt engineering recommendations are subjective and qualitative. Of approximately 25 distinct recommendations analyzed, only 3-4 include quantifiable criteria (Google's temperature=1.0, ~21-word prompt average, and accuracy claims). Microsoft explicitly describes prompting as "more of an art than a science." No vendor provides measurable success criteria, testable thresholds, or reproducible specifications for their recommendations.

Probability¶

Rating: Almost certain (95-99%) that vendor guidance is predominantly subjective.

Confidence in assessment: High

Confidence rationale: Based on direct analysis of current official vendor documentation. The findings are verifiable by reading the source documents.

Reasoning Chain¶

OpenAI provides 6 strategies, all qualitative: "write clear instructions," "provide reference text," etc. [SRC01-E01, High reliability, High relevance]
Anthropic provides 7 recommendations, 1 semi-quantifiable (20k token threshold); remainder structural/qualitative [SRC02-E01, High reliability, High relevance]
Google provides the most quantifiable guidance: temperature=1.0, ~21-word average, accuracy claims (3 quantifiable out of ~7) [SRC03-E01, High reliability, High relevance]
Microsoft provides 5 recommendations with 0 quantifiable elements and explicitly states prompting is "more art than science" [SRC04-E01, High reliability, High relevance]
JUDGMENT: Across ~25 recommendations, approximately 85-90% are subjective. The few quantifiable elements (temperature setting, token threshold) are operational parameters, not engineering specifications.

Evidence Base Summary¶

Source	Description	Reliability	Relevance	Key Finding
SRC01	OpenAI Guide	High	High	6 strategies, 0 quantifiable
SRC02	Anthropic Guide	High	High	7 recommendations, 1 semi-quantifiable
SRC03	Google Guide	High	High	7 recommendations, 3 quantifiable
SRC04	Microsoft Guide	High	High	5 recommendations, 0 quantifiable; "art not science"

Collection Synthesis¶

Dimension	Assessment
Evidence quality	Robust — all four major vendors' official documentation analyzed
Source agreement	High — all vendors provide predominantly qualitative guidance
Source independence	Independent — each vendor publishes its own documentation
Outliers	Google is a mild outlier with more quantifiable recommendations than others

Detail¶

The consistency across all four vendors is itself significant. These are the organizations that coined and popularized "prompt engineering," yet none of them provide engineering-grade specifications. The guidance reads more like cooking tips ("add more detail," "be specific," "use examples") than engineering requirements ("the signal-to-noise ratio must exceed 20dB," "the load must not exceed 500kg").

Gaps¶

Missing Evidence	Impact on Assessment
Internal vendor testing data	Moderate — vendors likely have quantifiable internal benchmarks they do not publish
Academic prompt engineering research	Minor — academic work may provide more measurable criteria
Vendor cookbooks and tutorials	Minor — supplementary materials may contain more specific guidance

Researcher Bias Check¶

Declared biases: The researcher argues that "prompt engineering" is not engineering. Finding predominantly subjective vendor guidance supports this argument.

Influence assessment: The finding is based on direct analysis of source documents. However, the classification of recommendations as "quantifiable" vs. "subjective" involves judgment. Another researcher might classify structural recommendations (like "use XML tags") as measurable, though they lack numerical thresholds.

Cross-References¶

Entity	ID	File
Hypotheses	H1, H2, H3	`hypotheses/`
Sources	SRC01, SRC02, SRC03, SRC04	`sources/`
ACH Matrix	—	`ach-matrix.md`
Self-Audit	—	`self-audit.md`