Skip to content

R0021/2026-03-25

Research R0021 — Prompt engineering definitions
Mode Query
Run date 2026-03-25
Queries 8
Prompt unified-research-standard-query v1.0-draft
Model claude-opus-4-6 (1M context)

This run investigated eight queries related to whether "prompt engineering" meets formal definitions of engineering. The evidence consistently shows that formal engineering has established definitions, licensing requirements, validation frameworks, and specification languages — none of which prompt engineering currently possesses.

Queries

Q001 — Engineering definitions — Converge on five core elements

Query: What are the formal definitions of "engineering" from ABET, IEEE, NSPE, and other professional/accreditation bodies?

Answer: Definitions converge on five elements: mathematical/scientific foundation, creative application through judgment, design of systems, economic constraints, and public safety/benefit.

Hypothesis Status Probability
H1: Definitions exist and converge Supported Almost certain (95-99%)
H2: No consensus definitions Eliminated
H3: Definitions vague/circular Partially supported

Sources: 3 | Searches: 2

Full analysis

Q002 — Engineer title requirements — Varies by jurisdiction

Query: What are the requirements for using the title "engineer" in regulated jurisdictions?

Answer: Title protection ranges from criminal penalties (Germany: up to 1 year imprisonment) to civil fines (Canada: up to $25,000) to limited protection of only "Professional Engineer" (most US states).

Hypothesis Status Probability
H1: Widely protected Partially supported
H2: Minimal/unenforced Eliminated
H3: Varies by jurisdiction Supported Almost certain (95-99%)

Sources: 3 | Searches: 2

Full analysis

Q003 — AI vendor prompt guidance — 84% subjective

Query: What specific, measurable guidance do the major AI vendors provide in their prompt engineering documentation?

Answer: Approximately 84% of vendor recommendations are subjective/qualitative. Only ~4 of ~25 recommendations include quantifiable criteria. Microsoft explicitly calls prompting "more of an art than a science."

Hypothesis Status Probability
H1: Predominantly quantifiable Eliminated
H2: Predominantly subjective Supported Almost certain (95-99%)
H3: Mixed Partially supported

Sources: 4 | Searches: 3

Full analysis

Q004 — Regulated industry AI validation — Frameworks exist, adapting

Query: How do regulated industries test and validate AI systems before deployment?

Answer: Aviation (FAA), healthcare (FDA), and finance (Fed/OCC) all have validation frameworks, but all acknowledge these were designed for traditional systems. The FAA states "rigorous safety assurance methods must be developed." SR 11-7 "may lose effectiveness" for adaptive AI.

Hypothesis Status Probability
H1: Rigorous requirements Partially supported
H2: Minimal/vague Eliminated
H3: Adapting, incomplete Supported Almost certain (95-99%)

Sources: 4 | Searches: 3

Full analysis

Q005 — Engineering label history — Multiple documented examples

Query: What previous disciplines went through a phase of being called "engineering" before formal methodology existed?

Answer: Multiple: Software engineering (1968, "expressed a need rather than a reality"), civil engineering (practiced centuries before first school in 1747), knowledge engineering (1980s, "little formal process" initially). The pattern is recurring and well-documented.

Hypothesis Status Probability
H1: Multiple examples Supported Almost certain (95-99%)
H2: Only software engineering Eliminated
H3: Common, varies Partially supported

Sources: 3 | Searches: 2

Full analysis

Q006 — RFC 2119 in AI prompts — Adjacent use only

Query: Has RFC 2119 requirement language been applied to AI prompt design in any published work?

Answer: One published example found: a practitioner blog post (deliberate.codes, Feb 2026) applying RFC 2119 to AI coding agent specifications. No formal standard, academic paper, or vendor documentation applies RFC 2119 to prompt engineering. The absence is itself significant.

Hypothesis Status Probability
H1: Formally applied Partially supported
H2: Not applied Eliminated
H3: Adjacent, not prompt-specific Supported Likely (55-80%)

Sources: 2 | Searches: 1

Full analysis

Q007 — AI decision auditing — Active field, challenges remain

Query: What published research exists on AI decision auditing, explainability requirements, or judgment logging?

Answer: Substantial: DARPA's 4-year XAI program (2017-2021, ~12,700 participants), EU AI Act Article 86 mandates "clear and meaningful explanations," 2,425 XAI papers published 2022-2025. Practical deployment faces challenges — post-hoc methods are approximations.

Hypothesis Status Probability
H1: Substantial with frameworks Partially supported
H2: Minimal/theoretical Eliminated
H3: Active, challenges remain Supported Almost certain (95-99%)

Sources: 3 | Searches: 2

Full analysis

Q008 — Natural language ambiguity — 430:1 for a common word

Query: How does natural language ambiguity compare to formal specification languages? How many definitions does "set" have?

Answer: "Set" has 430 definitions in OED2 (580 senses total); "run" now holds the record with 645 senses. Polysemy is pervasive — most content words are polysemous. Formal specification languages assign exactly one meaning per term. The ambiguity gap is approximately 430:1.

Hypothesis Status Probability
H1: Vastly more ambiguous Supported Almost certain (95-99%)
H2: Gap overstated Eliminated
H3: Real but context-dependent Partially supported

Sources: 2 | Searches: 1

Full analysis


Collection Analysis

Cross-Cutting Patterns

Pattern Queries Affected Significance
Formal engineering has measurable standards; prompt engineering does not Q001, Q003, Q004 Engineering definitions require scientific/mathematical foundations and measurable outcomes. Prompt engineering vendor guidance is 84% subjective.
The "engineering" label has been applied aspirationally before Q001, Q005 Software engineering (1968), knowledge engineering (1980s) both started with the label before the methodology. Prompt engineering follows the same pattern.
Regulated industries require validation frameworks; prompt engineering has none Q002, Q004, Q007 Aviation, healthcare, and finance all have AI validation frameworks. No equivalent exists for prompt engineering.
Natural language is fundamentally ambiguous as a specification tool Q003, Q006, Q008 430:1 polysemy ratio, no RFC 2119 adoption, predominantly subjective vendor guidance — all point to natural language as an imprecise specification medium.

Collection Statistics

Metric Value
Queries investigated 8
Queries answered with high confidence 7
Queries answered with medium confidence 1 (Q006)
Dominant hypothesis supported H1 or H3 in all cases
H2 (negative) eliminated 8 of 8 queries

Source Independence Assessment

Sources across the eight queries are largely independent. Q001-Q002 share engineering-body sources (ABET, ECPD). Q003 sources (four vendor documentation sets) are independent of each other and of all other queries. Q004 sources (four regulatory bodies) are independent. Q005-Q008 each draw from distinct evidence pools. No single source or source type dominates the collection.

Collection Gaps

Gap Impact Mitigation
International engineering definitions beyond US bodies Low US bodies are the most cited globally; international definitions would likely converge
Academic prompt engineering research (beyond vendor docs) Moderate Academic work may provide more rigorous analysis of prompt engineering practices
Quantitative comparison of prompt engineering outcomes Moderate No studies measuring prompt engineering effectiveness against formal engineering metrics

Collection Self-Audit

Domain Rating Notes
Eligibility criteria Low risk Criteria well-defined by queries
Search comprehensiveness Some concerns Limited to web search; academic databases not directly queried
Evaluation consistency Low risk Same framework applied across all 8 queries
Synthesis fairness Low risk Negative hypotheses tested and eliminated on evidence

Resources

Summary

Metric Value
Queries investigated 8
Files produced ~140
Sources scored 22
Evidence extracts 22
Results dispositioned ~80 selected + ~60 rejected = ~140 total
Duration (wall clock) 26m 44s
Tool uses (total) 103

Tool Breakdown

Tool Uses Purpose
WebSearch 18 Search queries across all 8 topics
WebFetch 8 Page content retrieval for key sources
Write ~50 File creation
Read 4 Methodology prompts and output format specs
Edit 0 No file modifications needed
Bash ~20 Directory creation, batch file generation

Token Distribution

Category Tokens
Input (context) ~200,000
Output (generation) ~80,000
Total ~280,000