Skip to content

R0056/2026-04-01/C004 — Claim Definition

Claim as Received

Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — reduces sycophancy by 84-85%, without changing the algorithm.

Claim as Clarified

Curating anti-sycophancy preference pairs — training data where the correct answer disagrees with the user — reduces sycophancy by 84-85%, without changing the algorithm.

BLUF

Not verified. The specific 84-85% reduction figure could not be found in any referenced paper.

Scope

  • Domain: AI safety / sycophancy research
  • Timeframe: Current (as of April 2026)
  • Testability: Verifiable against published research and public sources

Assessment Summary

Probability: Unlikely (20-45%)

Confidence: Medium

Hypothesis outcome: H3 prevailed.

[Full assessment in assessment.md.]

Status

Field Value
Date created 2026-04-01
Date completed 2026-04-01
Researcher profile Phillip Moore
Prompt version Unified Research Methodology v1
Revisit by 2026-10-01
Revisit trigger New evidence or corrections