Q001 — Self-Audit¶


Research	R0020 — Prompt Engineering Gaps
Run	2026-03-25
Query	Q001

ROBIS 4-Domain Audit¶

Domain 1: Eligibility Criteria¶

Rating: Low risk

Criterion	Assessment
Evidence types defined before searching	Yes — industry publications, framework documentation, and methodology guides targeted
Criteria consistent throughout	Yes — same relevance and reliability standards applied to all sources
Scope maintained	Yes — focused on prompt testing frameworks and methodologies throughout

Notes: Eligibility criteria were stable. Only deviation was rejecting results about using prompts for software testing (inverse of the query) which was an appropriate scope refinement.

Domain 2: Search Comprehensiveness¶

Rating: Some concerns

Criterion	Assessment
Multiple search strategies used	Yes — three distinct searches with different query terms
Searches designed to test each hypothesis	Partial — searches were designed to find frameworks (H1/H3) but no specific search targeted evidence against framework existence (H2)
All results dispositioned	Yes — 30 results returned, all dispositioned
Source diversity achieved	Partial — all sources are industry/vendor publications; no academic or peer-reviewed sources found

Notes: The absence of academic sources is a genuine gap. No search specifically targeted academic databases or peer-reviewed research on prompt testing methodology. This limits the evidence base to vendor and industry perspectives.

Domain 3: Evaluation Consistency¶

Rating: Low risk

Criterion	Assessment
All sources scored using same framework	Yes — same reliability/relevance/bias dimensions applied
Evidence typed consistently	Yes — Reported, Analytical, and Factual types applied consistently
ACH matrix applied	Yes — all evidence mapped to all hypotheses
Diagnosticity analysis performed	Yes — most and least diagnostic evidence identified

Notes: Consistent application of evaluation framework across all sources.

Domain 4: Synthesis Fairness¶

Rating: Low risk

Criterion	Assessment
All hypotheses given fair hearing	Yes — H1 received partial support, H2 was tested and eliminated, H3 received strongest support
Contradictory evidence surfaced	Yes — limitations and challenges were prominently documented alongside tool existence
Confidence calibrated to evidence	Yes — Medium confidence reflects the vendor-only evidence base
Gaps acknowledged	Yes — absence of academic sources and longitudinal data explicitly noted

Notes: The synthesis appropriately balances the existence of tools against their acknowledged limitations.

Overall Assessment¶

Overall risk of bias: Low risk

The primary limitation is the vendor-dominated evidence base (Domain 2 concern), which is inherent to the subject matter — prompt testing frameworks are industry products documented by industry sources. The absence of academic evaluation is itself a finding that supports H3.

Researcher Bias Check¶

Confirmation bias risk: Low. The query framing ("how is this tested?") could lead to overemphasis on testing solutions, but the research surfaced limitations and challenges prominently.
Availability bias risk: Some concern. All sources are web-accessible vendor publications, which may overrepresent marketed tools and underrepresent internal/proprietary testing approaches used by major AI labs.