Skip to content

R0040/2026-04-01/Q001 — Self-Audit

ROBIS 4-Domain Audit

Domain 1: Eligibility Criteria

Rating: Low risk

Criterion Assessment
Evidence criteria defined before searching Yes -- sought published methods, benchmarks, and production deployments
Criteria remained consistent Yes -- no shifting of inclusion criteria after results
Criteria appropriate for the query Yes -- open-ended survey question appropriately used broad criteria

Notes: Eligibility was straightforward for this query -- any published or deployed alignment method qualifies.

Domain 2: Search Comprehensiveness

Rating: Low risk

Criterion Assessment
Multiple search strategies used Yes -- 4 searches targeting different method families and a general survey
Searches designed for coverage Yes -- vocabulary exploration identified DPO, RLAIF, GRPO, KTO, IPO, ORPO, RLVR, SPIN
All results dispositioned Yes -- 50 results returned across 4 searches, all dispositioned (14 selected, 36 rejected)
Source diversity achieved Yes -- peer-reviewed papers, lab publications, technical analyses, industry overviews

Notes: 50 total results dispositioned across 4 searches. Coverage includes all major method families. Minor gap: ORPO details are thin -- fewer dedicated searches for this method.

Domain 3: Evaluation Consistency

Rating: Low risk

Criterion Assessment
All sources scored using same framework Yes -- all 7 sources have GRADE+Cochrane scorecards
Evidence typed consistently Yes -- Factual, Reported, and Analytical types applied consistently
Thematic clustering applied Yes -- 5 thematic clusters identified from evidence

Notes: Open-ended query used thematic clustering rather than ACH matrix, consistent with methodology for non-enumerable answer spaces.

Domain 4: Synthesis Fairness

Rating: Low risk

Criterion Assessment
All method families given fair coverage Yes -- no alternative dismissed without evidence
Contradictory evidence surfaced Yes -- Apple's DPO limitation finding, RLVR search-compression debate
Confidence calibrated to evidence Yes -- High confidence reflects strong convergence of independent sources
Gaps acknowledged Yes -- proprietary training details, head-to-head benchmarks, long-term stability

Notes: The assessment avoids declaring any single winner, which is appropriate given the evidence showing method selection depends on task characteristics.

Domain 5: Source-Back Verification

Rating: Low risk

Source Claim in Assessment Source Actually Says Match?
SRC02 DPO achieves 40-75% lower compute Search results report "40-75% lower compute cost compared to RLHF" Yes
SRC03 GRPO improved GSM8K from 82.9% to 88.2% Search results report these exact figures Yes
SRC04 KTO matches or exceeds DPO at 1B-30B Paper abstract states "matches or exceeds...at scales from 1B to 30B" Yes
SRC05 RLVR gains mostly from search compression Article states "Majority: Search compression" Yes
SRC06 RLAIF more harmless while maintaining helpfulness Paper states "significantly more harmless...helpfulness remains on par" Yes

Discrepancies found: 0

Corrections applied: None needed

Unresolved flags: None

Notes: All claims verified against source content. No interpretation drift detected.

Overall Assessment

Overall risk of bias: Low risk

The query was factual and open-ended (what methods exist?), making bias less likely than for evaluative queries. The evidence base is strong, with peer-reviewed papers from independent groups. The main limitation is that proprietary details from major labs are unavailable.

Researcher Bias Check

  • Confirmation bias risk: Low. The researcher's prior work on RLHF and sycophancy could lead to overemphasizing RLHF's limitations, but Q001 asks a neutral survey question (what alternatives exist?) rather than an evaluative one.
  • Availability bias risk: Low. Methods that appear frequently in search results (DPO, GRPO) received more detailed coverage, but this reflects genuine adoption rates rather than search bias.
  • Anchoring risk: Low. No prior hypothesis anchored the search -- the open-ended approach allowed all methods to emerge from the evidence.