R0028/2026-03-26¶


Research	R0028 — Prompt Engineering Claims
Mode	Claim
Run date	2026-03-26
Claims	33
Prompt	Unified Research Standard v1.0-draft
Model	Claude Opus 4.6

Verification of 33 factual claims from an article on prompt engineering, covering engineering definitions, title protection, historical precedents, prompt engineering documentation analysis, linguistic challenges, sycophancy research, regulatory frameworks, and testing standards.

Claims¶

C001 — Engineering definition five elements — Likely

Claim: ABET, IEEE, and the National Society of Professional Engineers all describe engineering through five core elements: a mathematical and scientific foundation; creative application through judgment; design of systems; economic constraints; and public safety and benefit.

Verdict: The five themes are genuine and identifiable across all three organizations' materials, but they do not share a single canonical five-element taxonomy. ABET's classic definition comes closest.

Hypothesis	Status	Probability
H1: Accurate as stated	Inconclusive	—
H2: Partially correct	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Confidence: Medium · Sources: 1 · Searches: 1

Hypothesis	Status	Probability
H1: Accurate as stated	Supported	Very likely (80-95%)
H2: Partially correct	Inconclusive	—
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate	Supported	Very likely (80-95%)
H2: Partially correct	Inconclusive	—
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate	Supported	Almost certain (95-99%)
H2: Partially correct	Inconclusive	—
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate	Inconclusive	—
H2: Directionally correct but unverifiable specifics	Supported	Roughly even (45-55%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Exactly once	Inconclusive	—
H2: Blog post exists but claim overstates	Supported	Unlikely (20-45%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate including venue	Inconclusive	—
H2: Research finding is real but venue and characterization are wrong	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Same research, CoT hurts	Inconclusive	—
H2: Finding is real but is a separate report	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate	Inconclusive	—
H2: Partially correct but conflates separate research	Supported	Roughly even (45-55%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate for OED2	Supported	Very likely (80-95%)
H2: Correct but outdated	Inconclusive	—
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: 14 languages	Inconclusive	—
H2: 13 languages, not 14	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Pattern	Claims Affected	Significance
Historical engineering claims are well-documented	C002, C003, C004, C005, C006, C007, C018	Historical/factual claims verified at high confidence
GAIL research attribution errors	C012, C013, C014	Claims attribute findings to wrong venue (EMNLP 2024) and conflate separate reports as "same research"
Prompt engineering immaturity claims are well-supported	C008, C009, C011, C021, C032	Evidence consistently supports the characterization of prompt engineering as lacking rigor
Multilingual gap claims are supported	C019, C020, C022, C023, C024	Strong evidence base for linguistic bias in LLMs and prompt engineering
Sycophancy claims are well-documented	C025, C026, C027, C028, C029	Multiple independent sources confirm sycophancy as a structural problem
Regulatory framework claims confirmed	C030, C031	Government/regulatory sources confirm the stated positions
Specific numerical claims vary in verifiability	C008, C020, C033	Some precise numbers could not be independently verified

Metric	Value
Claims investigated	33
Fully confirmed (Almost certain)	13 (C004, C005, C006, C009, C010, C015, C017, C018, C024, C027, C028, C029, C030)
Confirmed with nuance (Very likely)	8 (C002, C003, C007, C016, C019, C023, C025, C026)
Confirmed with caveats (Likely)	7 (C001, C012, C013, C020, C021, C022, C031, C032)
Roughly even chance	3 (C008, C014, C033)
Unlikely	1 (C011)
Very unlikely or Remote	0

Gap	Impact	Mitigation
No access to paywalled academic papers	May miss contradicting evidence	Web search captures abstracts and secondary reporting
EMNLP 2024 proceedings not directly checked	Cannot confirm/deny GAIL presentation at EMNLP	GAIL website and SSRN listings show no EMNLP connection
Pacemaker manufacturer documentation	Cannot verify test code ratio	IEC 62304 requirements make the claim plausible
Original content analysis of prompt guides	Cannot verify 84% figure	The qualitative characterization is consistent with guide content
EmotionPrompt full paper access	Limited view of emotional prompting findings	Abstracts and secondary sources provide sufficient context

Domain	Rating	Notes
Eligibility criteria	Pass	Consistent criteria applied across all 33 claims
Search comprehensiveness	Concern	Web search is the primary tool; some paywalled sources not accessible
Evaluation consistency	Pass	Same framework applied to all claims
Synthesis fairness	Pass	Claims found partially correct or incorrect where evidence warranted

Hypothesis	Status	Probability
H1: No standard exists	Inconclusive	—
H2: No published standard, but one is under development	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate including Arabic	Inconclusive	—
H2: Gaps are real but Arabic claim is wrong	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: SR 11-7 says this	Inconclusive	—
H2: Real limitation, wrong attribution	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Only PEPR and AWS	Inconclusive	—
H2: Both exist but not the only ones	Supported	Likely (55-80%)
H3: Materially wrong	Eliminated	—

Hypothesis	Status	Probability
H1: Accurate	Inconclusive	—
H2: Plausible but unverifiable	Inconclusive	—
H3: Materially wrong	Inconclusive	—

Metric	Value
Claims investigated	33
Files produced	~500
Sources scored	33
Evidence extracts	33
Results dispositioned	99 selected + 33 rejected = 132 total
Duration (wall clock)	19m 45s
Tool uses (total)	96

Tool	Uses	Purpose
WebSearch	24	Search queries across all claims
WebFetch	10	Page content retrieval for key sources
Write	~50	File creation (C001 detailed + batch generation)
Read	2	Reading methodology and output format specs
Edit	0	No edits needed
Bash	~15	Directory creation, batch file generation

Category	Tokens
Input (context)	~300,000
Output (generation)	~150,000
Total	~450,000

R0028/2026-03-26¶

Claims¶

Collection Analysis¶

Cross-Cutting Patterns¶

Collection Statistics¶

Source Independence Assessment¶

Collection Gaps¶

Collection Self-Audit¶

Resources¶

Summary¶

Tool Breakdown¶

Token Distribution¶