Skip to content

R0055/2026-04-01/C024/SRC01/E01

Research R0055 — RLHF Yes-Men Claims
Run 2026-04-01
Claim C024
Source SRC01
Evidence SRC01-E01
Type Factual

AIR 2024: 314 risk categories from 24 policy documents — sycophancy absent; bottom-up approach explains omission

URL: https://arxiv.org/html/2406.17864v1

Extract

AIR 2024 taxonomy was constructed bottom-up from 8 government and 16 company policies. It contains 314 unique risk categories across 4 tiers. Sycophancy is not mentioned because none of the 24 source policy documents address it. The MIT AI Risk Repository classifies 777 risks into 7 domains and 23 subdomains — sycophancy is not a distinct category. The Standardized Threat Taxonomy has 9 domains and 53 sub-threats, also without sycophancy.

Relevance to Hypotheses

Hypothesis Relationship Strength
H1 Supports Strong
H2 Supports Moderate
H3 Contradicts Strong

Context

Evidence directly relevant to testing the claim's factual assertions.