Skip to content

R0020/2026-03-25/Q001 — Self-Audit

ROBIS 4-Domain Audit

Domain 1: Eligibility Criteria

Rating: Low risk

Criterion Assessment
Evidence types defined before searching Yes — industry publications, framework documentation, and methodology guides targeted
Criteria consistent throughout Yes — same relevance and reliability standards applied to all sources
Scope maintained Yes — focused on prompt testing frameworks and methodologies throughout

Notes: Eligibility criteria were stable. Only deviation was rejecting results about using prompts for software testing (inverse of the query) which was an appropriate scope refinement.

Domain 2: Search Comprehensiveness

Rating: Some concerns

Criterion Assessment
Multiple search strategies used Yes — three distinct searches with different query terms
Searches designed to test each hypothesis Partial — searches were designed to find frameworks (H1/H3) but no specific search targeted evidence against framework existence (H2)
All results dispositioned Yes — 30 results returned, all dispositioned
Source diversity achieved Partial — all sources are industry/vendor publications; no academic or peer-reviewed sources found

Notes: The absence of academic sources is a genuine gap. No search specifically targeted academic databases or peer-reviewed research on prompt testing methodology. This limits the evidence base to vendor and industry perspectives.

Domain 3: Evaluation Consistency

Rating: Low risk

Criterion Assessment
All sources scored using same framework Yes — same reliability/relevance/bias dimensions applied
Evidence typed consistently Yes — Reported, Analytical, and Factual types applied consistently
ACH matrix applied Yes — all evidence mapped to all hypotheses
Diagnosticity analysis performed Yes — most and least diagnostic evidence identified

Notes: Consistent application of evaluation framework across all sources.

Domain 4: Synthesis Fairness

Rating: Low risk

Criterion Assessment
All hypotheses given fair hearing Yes — H1 received partial support, H2 was tested and eliminated, H3 received strongest support
Contradictory evidence surfaced Yes — limitations and challenges were prominently documented alongside tool existence
Confidence calibrated to evidence Yes — Medium confidence reflects the vendor-only evidence base
Gaps acknowledged Yes — absence of academic sources and longitudinal data explicitly noted

Notes: The synthesis appropriately balances the existence of tools against their acknowledged limitations.

Overall Assessment

Overall risk of bias: Low risk

The primary limitation is the vendor-dominated evidence base (Domain 2 concern), which is inherent to the subject matter — prompt testing frameworks are industry products documented by industry sources. The absence of academic evaluation is itself a finding that supports H3.

Researcher Bias Check

  • Confirmation bias risk: Low. The query framing ("how is this tested?") could lead to overemphasis on testing solutions, but the research surfaced limitations and challenges prominently.
  • Availability bias risk: Some concern. All sources are web-accessible vendor publications, which may overrepresent marketed tools and underrepresent internal/proprietary testing approaches used by major AI labs.