C002 — OpenSSF Scorecard Average for Critical Projects — Assessment¶

Contents¶

Evidence Synthesis
Probability Assessment
Evidence Gaps

The 5.4 average Scorecard score is confirmed from Chainguard's analysis of 1,500 Wolfi upstream repos, not from OpenSSF's full 1 million critical projects. The 1 million project scanning program exists and publishes results to BigQuery. The 0-10 scoring methodology works as described. The specific 5.4 figure for the full population is unverified, though Chainguard notes that scores in the 4-6 range are 'typical' for open source projects. The popularity-score correlation suggests the critical projects set (selected for importance/popularity) might score higher than 5.4.

Evidence Synthesis¶

Evidence quality: Medium — The 5.4 figure is confirmed from a specific source (Chainguard analysis of Wolfi upstream repos), and the Scorecard methodology is well-documented by OpenSSF. However, the critical gap is that the 5.4 applies to 1,500 Wolfi repos, not the full 1 million critical projects. No published mean score for the full 1 million population was found. The source noting that 'past research suggests these scores are typical' provides directional support but is not a rigorous citation.

Source agreement: Medium — The Chainguard blog and the OpenSSF Scorecard project documentation agree on methodology (0-10 scale, weighted aggregate, 1 million critical projects scanned weekly). They agree directionally that OSS projects score in the 4-6 range. But they do not agree on whether the 5.4 figure applies to the full 1 million population — the Chainguard blog explicitly states it applies to Wolfi upstream repos only.

Independence: Partially independent. Chainguard's analysis used the OpenSSF Scorecard tool (same tool, same scoring), but applied it to their own Wolfi upstream package set rather than the OpenSSF critical projects set. The 5.4 finding is from an independent sample but using dependent methodology.

Probability Assessment¶

C002-H1: Roughly even chance (40-55%)
The 5.4 figure is confirmed but for the wrong population. The Chainguard blog reports 5.4 for 1,500 Wolfi upstream repos, not the 1 million critical projects. The claim conflates these two populations. Directionally, projects scoring 4-6 is described as 'typical,' so the 5.4 figure may be a reasonable estimate for the broader population, but it is not a verified measurement of the full 1 million critical projects. The finding that more popular projects score higher suggests the critical projects set (selected for importance) might actually score higher than 5.4.
C002-H2: Very unlikely (10-20%)
The population size (1 million), the scoring methodology (0-10, weighted aggregate), and the weekly scanning are all confirmed directly from OpenSSF's own documentation. The scoring methodology works exactly as described. The only incorrect element is the attribution of the 5.4 figure to this population.
C002-H3: Likely (60-75%)
Most likely hypothesis. The claim is directionally correct — critical OSS projects score modestly on Scorecard, in the 4-6 range — but the specific 5.4 number comes from Wolfi upstream repos (1,500), not the full 1 million critical projects. The actual mean for the 1 million critical projects is unknown from our evidence but could be higher (critical projects are more popular, and popularity correlates with higher scores). Verdict: The claim is roughly even chance to likely correct in its specific form (40-55%), but directionally correct with higher confidence (75-85%). The 5.4 figure comes from Chainguard's Wolfi analysis, not OpenSSF's 1 million critical projects. The population, methodology, and general score range are confirmed, but the specific average for the full population is unverified.

Evidence Gaps¶

Expected but not found: - Published mean or median Scorecard score for the full 1 million critical projects population - Distribution statistics (mean, median, standard deviation) from the OpenSSF BigQuery dataset - Academic analysis of the 1 million critical projects Scorecard scores

Unanswered questions: - What is the actual mean Scorecard score for the full 1 million critical projects? - How does the Wolfi upstream population compare to the OpenSSF critical projects population in composition? - Has OpenSSF published any aggregate statistics from their weekly scans?

Impact on confidence: The gaps significantly reduce confidence in the specific 5.4 figure as applied to the full 1 million projects. The directional claim (critical projects score modestly, in the 4-6 range) is better supported. Querying the public BigQuery dataset would resolve this definitively.

← Back to item overview