Skip to content

R0042/2026-03-28/Q003/SRC02/E01

Research R0042 — Private AI enterprise motivations and sycophancy
Run 2026-03-28
Query Q003
Source SRC02
Evidence SRC02-E01
Type Factual

Google DeepMind consistency training as explicit anti-sycophancy research.

URL: https://arxiv.org/abs/2510.27062

Extract

Key findings:

  • Consistency training is a "self-supervised paradigm that teaches a model to be invariant to certain irrelevant cues in the prompt"
  • Two approaches: Bias-augmented Consistency Training (BCT) and Activation Consistency Training (ACT)
  • Results: On Gemini 2.5 Flash, BCT reduces sycophancy and reduces ClearHarm attack success rate from 67.8% to 2.9%
  • Uses model's own responses as training data, avoiding stale training data issues
  • Tested on Gemma 2, Gemma 3, and Gemini 2.5 Flash

This represents anti-sycophancy as an explicit research design goal at Google DeepMind. The paper does not discuss: - Enterprise customer deployments - Private AI systems built for anti-sycophancy - Enterprise demand for sycophancy reduction

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Contradicts Research institution work, not enterprise deployment
H2 Supports Confirms anti-sycophancy is a model provider research goal, not enterprise deployment goal
H3 Supports Anti-sycophancy is a component of model development, not a primary enterprise design goal

Context

This paper is important because it demonstrates that anti-sycophancy is an active, well-funded research area at a major AI lab (Google DeepMind). The research is motivated by model safety and reliability concerns, not by enterprise customer demand. This reinforces the pattern: anti-sycophancy is a supply-side concern (model providers) rather than a demand-side concern (enterprise customers).