R0042/2026-03-28/Q003/SRC02/E01¶
Google DeepMind consistency training as explicit anti-sycophancy research.
URL: https://arxiv.org/abs/2510.27062
Extract¶
Key findings:
- Consistency training is a "self-supervised paradigm that teaches a model to be invariant to certain irrelevant cues in the prompt"
- Two approaches: Bias-augmented Consistency Training (BCT) and Activation Consistency Training (ACT)
- Results: On Gemini 2.5 Flash, BCT reduces sycophancy and reduces ClearHarm attack success rate from 67.8% to 2.9%
- Uses model's own responses as training data, avoiding stale training data issues
- Tested on Gemma 2, Gemma 3, and Gemini 2.5 Flash
This represents anti-sycophancy as an explicit research design goal at Google DeepMind. The paper does not discuss: - Enterprise customer deployments - Private AI systems built for anti-sycophancy - Enterprise demand for sycophancy reduction
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Contradicts | Research institution work, not enterprise deployment |
| H2 | Supports | Confirms anti-sycophancy is a model provider research goal, not enterprise deployment goal |
| H3 | Supports | Anti-sycophancy is a component of model development, not a primary enterprise design goal |
Context¶
This paper is important because it demonstrates that anti-sycophancy is an active, well-funded research area at a major AI lab (Google DeepMind). The research is motivated by model safety and reliability concerns, not by enterprise customer demand. This reinforces the pattern: anti-sycophancy is a supply-side concern (model providers) rather than a demand-side concern (enterprise customers).