Skip to content

R0023/2026-03-25/Q003/SRC02

Wharton GAIL Report 1: demonstrates inherent variability that complicates degradation detection

Source

Field Value
Title Prompting Science Report 1: Prompt Engineering is Complicated and Contingent
Publisher SSRN / Wharton Generative AI Labs
Author(s) Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro
Date 2025-03-04
URL https://gail.wharton.upenn.edu/research-and-insights/tech-report-prompt-engineering-is-complicated-and-contingent/
Type Research paper (technical report)

Summary

Dimension Rating
Reliability High
Relevance Medium
Bias: Missing data Low risk
Bias: Measurement Low risk
Bias: Selective reporting Low risk
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation N/A — not an RCT
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability Same rigorous methodology as Q001 assessment.
Relevance Medium for Q003 specifically — demonstrates that even within a single model version, identical prompts produce inconsistent results. This is relevant because it shows that detecting "degradation" requires accounting for baseline variability.
Bias flags Low risk across the board.

Evidence Extracts

Evidence ID Summary
SRC02-E01 Same model, same prompt produces inconsistent results — baseline variability complicates degradation detection