Skip to content

R0023/2026-03-25/Q001/SRC05

Wharton GAIL foundational study: prompt engineering is complicated and contingent

Source

Field Value
Title Prompting Science Report 1: Prompt Engineering is Complicated and Contingent
Publisher SSRN / Wharton Generative AI Labs
Author(s) Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro
Date 2025-03-04
URL https://gail.wharton.upenn.edu/research-and-insights/tech-report-prompt-engineering-is-complicated-and-contingent/
Type Research paper (technical report)

Summary

Dimension Rating
Reliability High
Relevance High
Bias: Missing data Low risk
Bias: Measurement Low risk
Bias: Selective reporting Low risk
Bias: Randomization N/A — not an RCT
Bias: Protocol deviation N/A — not an RCT
Bias: COI/Funding Low risk

Rationale

Dimension Rationale
Reliability 100 repetitions per condition, GPQA Diamond benchmark, multiple correctness thresholds. Foundational methodology paper for the series.
Relevance Establishes that prompt engineering effects are measurement-dependent and highly variable — the meta-finding that explains why popular advice appears to work in demos but fails in practice.
Bias flags Low risk. Academic institution, no vendor affiliation, transparent methodology.

Evidence Extracts

Evidence ID Summary
SRC05-E01 Prompt tweaks produce 60-point swings on individual questions that average out across datasets, masking critical variability