R0021/2026-03-25¶
This run investigated eight queries related to whether "prompt engineering" meets formal definitions of engineering. The evidence consistently shows that formal engineering has established definitions, licensing requirements, validation frameworks, and specification languages — none of which prompt engineering currently possesses.
Queries¶
Q001 — Engineering definitions — Converge on five core elements
Query: What are the formal definitions of "engineering" from ABET, IEEE, NSPE, and other professional/accreditation bodies?
Answer: Definitions converge on five elements: mathematical/scientific foundation, creative application through judgment, design of systems, economic constraints, and public safety/benefit.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Definitions exist and converge | Supported | Almost certain (95-99%) |
| H2: No consensus definitions | Eliminated | — |
| H3: Definitions vague/circular | Partially supported | — |
Sources: 3 | Searches: 2
Q002 — Engineer title requirements — Varies by jurisdiction
Query: What are the requirements for using the title "engineer" in regulated jurisdictions?
Answer: Title protection ranges from criminal penalties (Germany: up to 1 year imprisonment) to civil fines (Canada: up to $25,000) to limited protection of only "Professional Engineer" (most US states).
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Widely protected | Partially supported | — |
| H2: Minimal/unenforced | Eliminated | — |
| H3: Varies by jurisdiction | Supported | Almost certain (95-99%) |
Sources: 3 | Searches: 2
Q003 — AI vendor prompt guidance — 84% subjective
Query: What specific, measurable guidance do the major AI vendors provide in their prompt engineering documentation?
Answer: Approximately 84% of vendor recommendations are subjective/qualitative. Only ~4 of ~25 recommendations include quantifiable criteria. Microsoft explicitly calls prompting "more of an art than a science."
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Predominantly quantifiable | Eliminated | — |
| H2: Predominantly subjective | Supported | Almost certain (95-99%) |
| H3: Mixed | Partially supported | — |
Sources: 4 | Searches: 3
Q004 — Regulated industry AI validation — Frameworks exist, adapting
Query: How do regulated industries test and validate AI systems before deployment?
Answer: Aviation (FAA), healthcare (FDA), and finance (Fed/OCC) all have validation frameworks, but all acknowledge these were designed for traditional systems. The FAA states "rigorous safety assurance methods must be developed." SR 11-7 "may lose effectiveness" for adaptive AI.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Rigorous requirements | Partially supported | — |
| H2: Minimal/vague | Eliminated | — |
| H3: Adapting, incomplete | Supported | Almost certain (95-99%) |
Sources: 4 | Searches: 3
Q005 — Engineering label history — Multiple documented examples
Query: What previous disciplines went through a phase of being called "engineering" before formal methodology existed?
Answer: Multiple: Software engineering (1968, "expressed a need rather than a reality"), civil engineering (practiced centuries before first school in 1747), knowledge engineering (1980s, "little formal process" initially). The pattern is recurring and well-documented.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Multiple examples | Supported | Almost certain (95-99%) |
| H2: Only software engineering | Eliminated | — |
| H3: Common, varies | Partially supported | — |
Sources: 3 | Searches: 2
Q006 — RFC 2119 in AI prompts — Adjacent use only
Query: Has RFC 2119 requirement language been applied to AI prompt design in any published work?
Answer: One published example found: a practitioner blog post (deliberate.codes, Feb 2026) applying RFC 2119 to AI coding agent specifications. No formal standard, academic paper, or vendor documentation applies RFC 2119 to prompt engineering. The absence is itself significant.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Formally applied | Partially supported | — |
| H2: Not applied | Eliminated | — |
| H3: Adjacent, not prompt-specific | Supported | Likely (55-80%) |
Sources: 2 | Searches: 1
Q007 — AI decision auditing — Active field, challenges remain
Query: What published research exists on AI decision auditing, explainability requirements, or judgment logging?
Answer: Substantial: DARPA's 4-year XAI program (2017-2021, ~12,700 participants), EU AI Act Article 86 mandates "clear and meaningful explanations," 2,425 XAI papers published 2022-2025. Practical deployment faces challenges — post-hoc methods are approximations.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Substantial with frameworks | Partially supported | — |
| H2: Minimal/theoretical | Eliminated | — |
| H3: Active, challenges remain | Supported | Almost certain (95-99%) |
Sources: 3 | Searches: 2
Q008 — Natural language ambiguity — 430:1 for a common word
Query: How does natural language ambiguity compare to formal specification languages? How many definitions does "set" have?
Answer: "Set" has 430 definitions in OED2 (580 senses total); "run" now holds the record with 645 senses. Polysemy is pervasive — most content words are polysemous. Formal specification languages assign exactly one meaning per term. The ambiguity gap is approximately 430:1.
| Hypothesis | Status | Probability |
|---|---|---|
| H1: Vastly more ambiguous | Supported | Almost certain (95-99%) |
| H2: Gap overstated | Eliminated | — |
| H3: Real but context-dependent | Partially supported | — |
Sources: 2 | Searches: 1
Collection Analysis¶
Cross-Cutting Patterns¶
| Pattern | Queries Affected | Significance |
|---|---|---|
| Formal engineering has measurable standards; prompt engineering does not | Q001, Q003, Q004 | Engineering definitions require scientific/mathematical foundations and measurable outcomes. Prompt engineering vendor guidance is 84% subjective. |
| The "engineering" label has been applied aspirationally before | Q001, Q005 | Software engineering (1968), knowledge engineering (1980s) both started with the label before the methodology. Prompt engineering follows the same pattern. |
| Regulated industries require validation frameworks; prompt engineering has none | Q002, Q004, Q007 | Aviation, healthcare, and finance all have AI validation frameworks. No equivalent exists for prompt engineering. |
| Natural language is fundamentally ambiguous as a specification tool | Q003, Q006, Q008 | 430:1 polysemy ratio, no RFC 2119 adoption, predominantly subjective vendor guidance — all point to natural language as an imprecise specification medium. |
Collection Statistics¶
| Metric | Value |
|---|---|
| Queries investigated | 8 |
| Queries answered with high confidence | 7 |
| Queries answered with medium confidence | 1 (Q006) |
| Dominant hypothesis supported | H1 or H3 in all cases |
| H2 (negative) eliminated | 8 of 8 queries |
Source Independence Assessment¶
Sources across the eight queries are largely independent. Q001-Q002 share engineering-body sources (ABET, ECPD). Q003 sources (four vendor documentation sets) are independent of each other and of all other queries. Q004 sources (four regulatory bodies) are independent. Q005-Q008 each draw from distinct evidence pools. No single source or source type dominates the collection.
Collection Gaps¶
| Gap | Impact | Mitigation |
|---|---|---|
| International engineering definitions beyond US bodies | Low | US bodies are the most cited globally; international definitions would likely converge |
| Academic prompt engineering research (beyond vendor docs) | Moderate | Academic work may provide more rigorous analysis of prompt engineering practices |
| Quantitative comparison of prompt engineering outcomes | Moderate | No studies measuring prompt engineering effectiveness against formal engineering metrics |
Collection Self-Audit¶
| Domain | Rating | Notes |
|---|---|---|
| Eligibility criteria | Low risk | Criteria well-defined by queries |
| Search comprehensiveness | Some concerns | Limited to web search; academic databases not directly queried |
| Evaluation consistency | Low risk | Same framework applied across all 8 queries |
| Synthesis fairness | Low risk | Negative hypotheses tested and eliminated on evidence |
Resources¶
Summary¶
| Metric | Value |
|---|---|
| Queries investigated | 8 |
| Files produced | ~140 |
| Sources scored | 22 |
| Evidence extracts | 22 |
| Results dispositioned | ~80 selected + ~60 rejected = ~140 total |
| Duration (wall clock) | 26m 44s |
| Tool uses (total) | 103 |
Tool Breakdown¶
| Tool | Uses | Purpose |
|---|---|---|
| WebSearch | 18 | Search queries across all 8 topics |
| WebFetch | 8 | Page content retrieval for key sources |
| Write | ~50 | File creation |
| Read | 4 | Methodology prompts and output format specs |
| Edit | 0 | No file modifications needed |
| Bash | ~20 | Directory creation, batch file generation |
Token Distribution¶
| Category | Tokens |
|---|---|
| Input (context) | ~200,000 |
| Output (generation) | ~80,000 |
| Total | ~280,000 |