R0021/2026-03-25¶


Research	R0021 — Prompt engineering definitions
Mode	Query
Run date	2026-03-25
Queries	8
Prompt	unified-research-standard-query v1.0-draft
Model	claude-opus-4-6 (1M context)

This run investigated eight queries related to whether "prompt engineering" meets formal definitions of engineering. The evidence consistently shows that formal engineering has established definitions, licensing requirements, validation frameworks, and specification languages — none of which prompt engineering currently possesses.

Queries¶

Q001 — Engineering definitions — Converge on five core elements

Query: What are the formal definitions of "engineering" from ABET, IEEE, NSPE, and other professional/accreditation bodies?

Answer: Definitions converge on five elements: mathematical/scientific foundation, creative application through judgment, design of systems, economic constraints, and public safety/benefit.

Hypothesis	Status	Probability
H1: Definitions exist and converge	Supported	Almost certain (95-99%)
H2: No consensus definitions	Eliminated	—
H3: Definitions vague/circular	Partially supported	—

Sources: 3 | Searches: 2

Full analysis

Q002 — Engineer title requirements — Varies by jurisdiction

Query: What are the requirements for using the title "engineer" in regulated jurisdictions?

Answer: Title protection ranges from criminal penalties (Germany: up to 1 year imprisonment) to civil fines (Canada: up to $25,000) to limited protection of only "Professional Engineer" (most US states).

Hypothesis	Status	Probability
H1: Widely protected	Partially supported	—
H2: Minimal/unenforced	Eliminated	—
H3: Varies by jurisdiction	Supported	Almost certain (95-99%)

Sources: 3 | Searches: 2

Full analysis

Q003 — AI vendor prompt guidance — 84% subjective

Query: What specific, measurable guidance do the major AI vendors provide in their prompt engineering documentation?

Answer: Approximately 84% of vendor recommendations are subjective/qualitative. Only ~4 of ~25 recommendations include quantifiable criteria. Microsoft explicitly calls prompting "more of an art than a science."

Hypothesis	Status	Probability
H1: Predominantly quantifiable	Eliminated	—
H2: Predominantly subjective	Supported	Almost certain (95-99%)
H3: Mixed	Partially supported	—

Sources: 4 | Searches: 3

Full analysis

Q004 — Regulated industry AI validation — Frameworks exist, adapting

Query: How do regulated industries test and validate AI systems before deployment?

Answer: Aviation (FAA), healthcare (FDA), and finance (Fed/OCC) all have validation frameworks, but all acknowledge these were designed for traditional systems. The FAA states "rigorous safety assurance methods must be developed." SR 11-7 "may lose effectiveness" for adaptive AI.

Hypothesis	Status	Probability
H1: Rigorous requirements	Partially supported	—
H2: Minimal/vague	Eliminated	—
H3: Adapting, incomplete	Supported	Almost certain (95-99%)

Sources: 4 | Searches: 3

Full analysis

Q005 — Engineering label history — Multiple documented examples

Query: What previous disciplines went through a phase of being called "engineering" before formal methodology existed?

Answer: Multiple: Software engineering (1968, "expressed a need rather than a reality"), civil engineering (practiced centuries before first school in 1747), knowledge engineering (1980s, "little formal process" initially). The pattern is recurring and well-documented.

Hypothesis	Status	Probability
H1: Multiple examples	Supported	Almost certain (95-99%)
H2: Only software engineering	Eliminated	—
H3: Common, varies	Partially supported	—

Sources: 3 | Searches: 2

Full analysis

Q006 — RFC 2119 in AI prompts — Adjacent use only

Query: Has RFC 2119 requirement language been applied to AI prompt design in any published work?

Answer: One published example found: a practitioner blog post (deliberate.codes, Feb 2026) applying RFC 2119 to AI coding agent specifications. No formal standard, academic paper, or vendor documentation applies RFC 2119 to prompt engineering. The absence is itself significant.

Hypothesis	Status	Probability
H1: Formally applied	Partially supported	—
H2: Not applied	Eliminated	—
H3: Adjacent, not prompt-specific	Supported	Likely (55-80%)

Sources: 2 | Searches: 1

Full analysis

Q007 — AI decision auditing — Active field, challenges remain

Query: What published research exists on AI decision auditing, explainability requirements, or judgment logging?

Answer: Substantial: DARPA's 4-year XAI program (2017-2021, ~12,700 participants), EU AI Act Article 86 mandates "clear and meaningful explanations," 2,425 XAI papers published 2022-2025. Practical deployment faces challenges — post-hoc methods are approximations.

Hypothesis	Status	Probability
H1: Substantial with frameworks	Partially supported	—
H2: Minimal/theoretical	Eliminated	—
H3: Active, challenges remain	Supported	Almost certain (95-99%)

Sources: 3 | Searches: 2

Full analysis

Q008 — Natural language ambiguity — 430:1 for a common word

Query: How does natural language ambiguity compare to formal specification languages? How many definitions does "set" have?

Answer: "Set" has 430 definitions in OED2 (580 senses total); "run" now holds the record with 645 senses. Polysemy is pervasive — most content words are polysemous. Formal specification languages assign exactly one meaning per term. The ambiguity gap is approximately 430:1.

Hypothesis	Status	Probability
H1: Vastly more ambiguous	Supported	Almost certain (95-99%)
H2: Gap overstated	Eliminated	—
H3: Real but context-dependent	Partially supported	—

Sources: 2 | Searches: 1

Full analysis

Collection Analysis¶

Cross-Cutting Patterns¶

Pattern	Queries Affected	Significance
Formal engineering has measurable standards; prompt engineering does not	Q001, Q003, Q004	Engineering definitions require scientific/mathematical foundations and measurable outcomes. Prompt engineering vendor guidance is 84% subjective.
The "engineering" label has been applied aspirationally before	Q001, Q005	Software engineering (1968), knowledge engineering (1980s) both started with the label before the methodology. Prompt engineering follows the same pattern.
Regulated industries require validation frameworks; prompt engineering has none	Q002, Q004, Q007	Aviation, healthcare, and finance all have AI validation frameworks. No equivalent exists for prompt engineering.
Natural language is fundamentally ambiguous as a specification tool	Q003, Q006, Q008	430:1 polysemy ratio, no RFC 2119 adoption, predominantly subjective vendor guidance — all point to natural language as an imprecise specification medium.

Collection Statistics¶

Metric	Value
Queries investigated	8
Queries answered with high confidence	7
Queries answered with medium confidence	1 (Q006)
Dominant hypothesis supported	H1 or H3 in all cases
H2 (negative) eliminated	8 of 8 queries

Source Independence Assessment¶

Sources across the eight queries are largely independent. Q001-Q002 share engineering-body sources (ABET, ECPD). Q003 sources (four vendor documentation sets) are independent of each other and of all other queries. Q004 sources (four regulatory bodies) are independent. Q005-Q008 each draw from distinct evidence pools. No single source or source type dominates the collection.

Collection Gaps¶

Gap	Impact	Mitigation
International engineering definitions beyond US bodies	Low	US bodies are the most cited globally; international definitions would likely converge
Academic prompt engineering research (beyond vendor docs)	Moderate	Academic work may provide more rigorous analysis of prompt engineering practices
Quantitative comparison of prompt engineering outcomes	Moderate	No studies measuring prompt engineering effectiveness against formal engineering metrics

Collection Self-Audit¶

Domain	Rating	Notes
Eligibility criteria	Low risk	Criteria well-defined by queries
Search comprehensiveness	Some concerns	Limited to web search; academic databases not directly queried
Evaluation consistency	Low risk	Same framework applied across all 8 queries
Synthesis fairness	Low risk	Negative hypotheses tested and eliminated on evidence

Resources¶

Summary¶

Metric	Value
Queries investigated	8
Files produced	~140
Sources scored	22
Evidence extracts	22
Results dispositioned	~80 selected + ~60 rejected = ~140 total
Duration (wall clock)	26m 44s
Tool uses (total)	103

Tool Breakdown¶

Tool	Uses	Purpose
WebSearch	18	Search queries across all 8 topics
WebFetch	8	Page content retrieval for key sources
Write	~50	File creation
Read	4	Methodology prompts and output format specs
Edit	0	No file modifications needed
Bash	~20	Directory creation, batch file generation

Token Distribution¶

Category	Tokens
Input (context)	~200,000
Output (generation)	~80,000
Total	~280,000