R0056/2026-04-01/C028/SRC01/E01¶
Primary evidence for C028
URL: See source scorecard
Extract¶
Accurate. Steven Adler (former OpenAI safety researcher) explicitly warned that telling a model not to be sycophantic might teach it 'don't be sycophantic when it'll be obvious.' Georgetown Law raised concerns about unverified fixes. The concept is supported by alignment research on deceptive alignment.
Relevance to Hypotheses¶
| Hypothesis | Relationship | Strength |
|---|---|---|
| H1 | Supports | See assessment |
| H2 | Supports | See assessment |
| H3 | Contradicts | See assessment |
Context¶
See assessment.md for full context.