Skip to content

R0041/2026-03-28/Q003/H2

Research R0041 — Enterprise Sycophancy
Run 2026-03-28
Query Q003
Hypothesis H2

Statement

RLVR cannot meaningfully address sycophancy. Its domain limitations are too severe, or its mechanism does not actually prevent sycophancy-related behaviors.

Status

Current: Eliminated

The evidence clearly shows RLVR can prevent sycophancy in verifiable domains. Its mechanism (deterministic binary rewards from ground truth) structurally eliminates the preference-based bias that causes sycophancy. DeepSeek-R1 demonstrated this in mathematics and coding. The question is not whether RLVR can address sycophancy (it can) but how broadly (not broadly enough for most sycophancy-prone interactions).

Supporting Evidence

Evidence Summary
SRC01-E01 Spurious rewards: Qwen2.5-Math-7B improved 21.4% with random rewards, nearly matching 29.1% from ground truth — raising questions about RLVR's mechanism

Contradicting Evidence

Evidence Summary
SRC01-E01 RLVR fundamentally removes preference-based reward signals that cause sycophancy
SRC03-E01 RLHF amplifies sycophancy through a specific mechanism RLVR does not share
SRC04-E01 DeepSeek-R1 demonstrates functional RLVR in math/code domains

Reasoning

H2 is eliminated. While RLVR has significant limitations (spurious reward concerns, domain constraints), it does structurally avoid the preference-based mechanism that causes sycophancy. The spurious reward finding is concerning but does not negate the structural advantage — it suggests implementation details matter, not that the approach is fundamentally flawed.

Relationship to Other Hypotheses

H2 is the null hypothesis. Its elimination confirms RLVR has real anti-sycophancy properties, directing analysis toward the scope question (H1 vs. H3).