Q001 — Self-Audit¶


Research	R0040 — RLHF Alternatives
Run	2026-04-01
Query	Q001

ROBIS 4-Domain Audit¶

Domain 1: Eligibility Criteria¶

Rating: Low risk

Criterion	Assessment
Evidence criteria defined before searching	Yes -- sought published methods, benchmarks, and production deployments
Criteria remained consistent	Yes -- no shifting of inclusion criteria after results
Criteria appropriate for the query	Yes -- open-ended survey question appropriately used broad criteria

Notes: Eligibility was straightforward for this query -- any published or deployed alignment method qualifies.

Domain 2: Search Comprehensiveness¶

Rating: Low risk

Criterion	Assessment
Multiple search strategies used	Yes -- 4 searches targeting different method families and a general survey
Searches designed for coverage	Yes -- vocabulary exploration identified DPO, RLAIF, GRPO, KTO, IPO, ORPO, RLVR, SPIN
All results dispositioned	Yes -- 50 results returned across 4 searches, all dispositioned (14 selected, 36 rejected)
Source diversity achieved	Yes -- peer-reviewed papers, lab publications, technical analyses, industry overviews

Notes: 50 total results dispositioned across 4 searches. Coverage includes all major method families. Minor gap: ORPO details are thin -- fewer dedicated searches for this method.

Domain 3: Evaluation Consistency¶

Rating: Low risk

Criterion	Assessment
All sources scored using same framework	Yes -- all 7 sources have GRADE+Cochrane scorecards
Evidence typed consistently	Yes -- Factual, Reported, and Analytical types applied consistently
Thematic clustering applied	Yes -- 5 thematic clusters identified from evidence

Notes: Open-ended query used thematic clustering rather than ACH matrix, consistent with methodology for non-enumerable answer spaces.

Domain 4: Synthesis Fairness¶

Rating: Low risk

Criterion	Assessment
All method families given fair coverage	Yes -- no alternative dismissed without evidence
Contradictory evidence surfaced	Yes -- Apple's DPO limitation finding, RLVR search-compression debate
Confidence calibrated to evidence	Yes -- High confidence reflects strong convergence of independent sources
Gaps acknowledged	Yes -- proprietary training details, head-to-head benchmarks, long-term stability

Notes: The assessment avoids declaring any single winner, which is appropriate given the evidence showing method selection depends on task characteristics.

Domain 5: Source-Back Verification¶

Rating: Low risk

Source	Claim in Assessment	Source Actually Says	Match?
SRC02	DPO achieves 40-75% lower compute	Search results report "40-75% lower compute cost compared to RLHF"	Yes
SRC03	GRPO improved GSM8K from 82.9% to 88.2%	Search results report these exact figures	Yes
SRC04	KTO matches or exceeds DPO at 1B-30B	Paper abstract states "matches or exceeds...at scales from 1B to 30B"	Yes
SRC05	RLVR gains mostly from search compression	Article states "Majority: Search compression"	Yes
SRC06	RLAIF more harmless while maintaining helpfulness	Paper states "significantly more harmless...helpfulness remains on par"	Yes

Discrepancies found: 0

Corrections applied: None needed

Unresolved flags: None

Notes: All claims verified against source content. No interpretation drift detected.

Overall Assessment¶

Overall risk of bias: Low risk

The query was factual and open-ended (what methods exist?), making bias less likely than for evaluative queries. The evidence base is strong, with peer-reviewed papers from independent groups. The main limitation is that proprietary details from major labs are unavailable.

Researcher Bias Check¶

Confirmation bias risk: Low. The researcher's prior work on RLHF and sycophancy could lead to overemphasizing RLHF's limitations, but Q001 asks a neutral survey question (what alternatives exist?) rather than an evaluative one.
Availability bias risk: Low. Methods that appear frequently in search results (DPO, GRPO) received more detailed coverage, but this reflects genuine adoption rates rather than search bias.
Anchoring risk: Low. No prior hypothesis anchored the search -- the open-ended approach allowed all methods to emerge from the evidence.