Q001 — Self-Audit¶


Research	R0023 — Counterproductive advice and prompt lifecycle
Run	2026-03-25
Query	Q001

ROBIS 4-Domain Audit¶

Domain 1: Eligibility Criteria¶

Rating: Pass

Criterion	Assessment
Evidence criteria defined before searching	Yes — searched for empirical studies with controlled methodology and measurable outcomes
Criteria applied consistently	Yes — same standard applied to supporting and contradicting evidence
Criteria did not shift after seeing results	Pass — did not expand or narrow criteria based on initial findings

Notes: Eligibility was appropriately scoped to empirical studies. Blog posts and tutorials were consistently excluded unless they reported specific data.

Domain 2: Search Comprehensiveness¶

Rating: Pass

Criterion	Assessment
Multiple search strategies used	Yes — 2 search rounds with distinct query strategies
Searches designed to test each hypothesis	Yes — searched for both evidence of counterproductive effects AND evidence that techniques work
All results dispositioned	Yes — 40 results returned, all dispositioned as selected or rejected with rationale
Source diversity achieved	Yes — academic papers, technical reports, peer-reviewed venues, journalism

Notes: 40 total results across 2 search rounds. 12 selected, 28 rejected. All dispositioned with rationale.

Domain 3: Evaluation Consistency¶

Rating: Pass

Criterion	Assessment
All sources scored using same framework	Yes — GRADE reliability/relevance + 6 bias domains applied to all sources
Evidence typed consistently	Yes — Statistical, Analytical, Factual types applied consistently
ACH matrix applied	Yes — all evidence mapped to all hypotheses
Diagnosticity analysis performed	Yes — most and least diagnostic evidence identified

Notes: All 5 sources scored using identical framework. 8 evidence extracts all mapped to 3 hypotheses.

Domain 4: Synthesis Fairness¶

Rating: Pass

Criterion	Assessment
All hypotheses given fair hearing	Yes — H2 received active search for supporting evidence; the one positive finding (Gemini 2.0 Flash) was reported
Contradictory evidence surfaced	Yes — evidence that CoT helps non-reasoning models was prominently reported
Confidence calibrated to evidence	Yes — confidence rated High based on convergence of independent studies
Gaps acknowledged	Yes — long-form generation tasks, real-world deployment, few-shot evidence gaps identified

Notes: The evidence was strongly one-directional. H2 was eliminated because no credible evidence supported it, not because it was unfairly treated.

Overall Assessment¶

Overall risk of bias: Low risk

The research process followed the methodology consistently. The primary risk is that the queries themselves were framed to find counterproductive advice, which could create a selection bias. However, the research actively sought evidence supporting H2 (techniques generally work), and the failure to find such evidence under controlled conditions reflects the actual state of the literature rather than search bias.

Researcher Bias Check¶

Confirmation bias risk: The query framing ("found to be actively counterproductive") embeds an expectation. The research compensated by searching for evidence that techniques work (H2) and reporting positive findings where they exist (Gemini 2.0 Flash, non-reasoning model CoT benefits).
Availability bias risk: The Wharton Prompting Science Reports dominated the evidence base (3 of 5 sources). This reflects the reality that this research group has produced the most rigorous empirical work on this topic, not a failure to search broadly.
Anchoring risk: Low — the Wharton findings were discovered through search, not pre-known.