Skip to content

R0041/2026-04-01/Q003/H1

Research R0041 — Enterprise Sycophancy
Run 2026-04-01
Query Q003
Hypothesis H1

Statement

RLVR can broadly eliminate sycophancy across most domains by replacing subjective reward models with verifiable ones.

Status

Current: Eliminated

Supporting Evidence

Evidence Summary
SRC01-E01 RLVR eliminates the reward model as a sycophancy vector in verifiable domains

Contradicting Evidence

Evidence Summary
SRC01-E02 RLVR fails for creative writing, brand voice, nuanced argumentation -- precisely where sycophancy matters most
SRC03-E01 RLVR "cannot be directly applied to open-ended tasks" since it "presupposes the existence of standard answers"

Reasoning

RLVR's fundamental requirement for programmatic verifiers limits it to domains with objectively correct answers. Sycophancy is most dangerous in subjective, interpersonal, and advisory contexts where RLVR cannot apply. H1 is eliminated.

Relationship to Other Hypotheses

H1 is the strongest claim. Its elimination narrows the answer to H2 (partial applicability) or H3 (no meaningful impact).