Skip to content

R0042/2026-03-28/Q003/SRC02

Google DeepMind — Consistency Training for Anti-Sycophancy

Source

Field Value
Title Consistency Training Helps Stop Sycophancy and Jailbreaks
Publisher arXiv (Google DeepMind)
Author(s) Alex Irpan, Alexander Matt Turner, Mark Kurzeja, David K. Elson, Rohin Shah
Date 2025-10-31
URL https://arxiv.org/abs/2510.27062
Type Academic research paper

Summary

Dimension Rating
Reliability High
Relevance Medium
Bias: Missing data Low risk
Bias: Measurement Low risk
Bias: Selective reporting Low risk
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation Low risk
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability Peer-reviewed research from Google DeepMind with reproducible methodology.
Relevance Demonstrates anti-sycophancy as an explicit research design goal at a major AI lab. However, this is model provider research, not enterprise customer deployment.
Bias flags Low risk — academic research with clear methodology. Affiliated with Google but research goals are transparent.

Evidence Extracts

Evidence ID Summary
SRC02-E01 Consistency training as anti-sycophancy method from Google DeepMind