Skip to content

R0023/2026-03-25/Q001 — Self-Audit

ROBIS 4-Domain Audit

Domain 1: Eligibility Criteria

Rating: Pass

Criterion Assessment
Evidence criteria defined before searching Yes — searched for empirical studies with controlled methodology and measurable outcomes
Criteria applied consistently Yes — same standard applied to supporting and contradicting evidence
Criteria did not shift after seeing results Pass — did not expand or narrow criteria based on initial findings

Notes: Eligibility was appropriately scoped to empirical studies. Blog posts and tutorials were consistently excluded unless they reported specific data.

Domain 2: Search Comprehensiveness

Rating: Pass

Criterion Assessment
Multiple search strategies used Yes — 2 search rounds with distinct query strategies
Searches designed to test each hypothesis Yes — searched for both evidence of counterproductive effects AND evidence that techniques work
All results dispositioned Yes — 40 results returned, all dispositioned as selected or rejected with rationale
Source diversity achieved Yes — academic papers, technical reports, peer-reviewed venues, journalism

Notes: 40 total results across 2 search rounds. 12 selected, 28 rejected. All dispositioned with rationale.

Domain 3: Evaluation Consistency

Rating: Pass

Criterion Assessment
All sources scored using same framework Yes — GRADE reliability/relevance + 6 bias domains applied to all sources
Evidence typed consistently Yes — Statistical, Analytical, Factual types applied consistently
ACH matrix applied Yes — all evidence mapped to all hypotheses
Diagnosticity analysis performed Yes — most and least diagnostic evidence identified

Notes: All 5 sources scored using identical framework. 8 evidence extracts all mapped to 3 hypotheses.

Domain 4: Synthesis Fairness

Rating: Pass

Criterion Assessment
All hypotheses given fair hearing Yes — H2 received active search for supporting evidence; the one positive finding (Gemini 2.0 Flash) was reported
Contradictory evidence surfaced Yes — evidence that CoT helps non-reasoning models was prominently reported
Confidence calibrated to evidence Yes — confidence rated High based on convergence of independent studies
Gaps acknowledged Yes — long-form generation tasks, real-world deployment, few-shot evidence gaps identified

Notes: The evidence was strongly one-directional. H2 was eliminated because no credible evidence supported it, not because it was unfairly treated.

Overall Assessment

Overall risk of bias: Low risk

The research process followed the methodology consistently. The primary risk is that the queries themselves were framed to find counterproductive advice, which could create a selection bias. However, the research actively sought evidence supporting H2 (techniques generally work), and the failure to find such evidence under controlled conditions reflects the actual state of the literature rather than search bias.

Researcher Bias Check

  • Confirmation bias risk: The query framing ("found to be actively counterproductive") embeds an expectation. The research compensated by searching for evidence that techniques work (H2) and reporting positive findings where they exist (Gemini 2.0 Flash, non-reasoning model CoT benefits).
  • Availability bias risk: The Wharton Prompting Science Reports dominated the evidence base (3 of 5 sources). This reflects the reality that this research group has produced the most rigorous empirical work on this topic, not a failure to search broadly.
  • Anchoring risk: Low — the Wharton findings were discovered through search, not pre-known.