R0023/2026-03-25/Q001/SRC05
Wharton GAIL foundational study: prompt engineering is complicated and contingent
Source
| Field |
Value |
| Title |
Prompting Science Report 1: Prompt Engineering is Complicated and Contingent |
| Publisher |
SSRN / Wharton Generative AI Labs |
| Author(s) |
Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro |
| Date |
2025-03-04 |
| URL |
https://gail.wharton.upenn.edu/research-and-insights/tech-report-prompt-engineering-is-complicated-and-contingent/ |
| Type |
Research paper (technical report) |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
High |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Low risk |
| Bias: Randomization |
N/A — not an RCT |
| Bias: Protocol deviation |
N/A — not an RCT |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
100 repetitions per condition, GPQA Diamond benchmark, multiple correctness thresholds. Foundational methodology paper for the series. |
| Relevance |
Establishes that prompt engineering effects are measurement-dependent and highly variable — the meta-finding that explains why popular advice appears to work in demos but fails in practice. |
| Bias flags |
Low risk. Academic institution, no vendor affiliation, transparent methodology. |
| Evidence ID |
Summary |
| SRC05-E01 |
Prompt tweaks produce 60-point swings on individual questions that average out across datasets, masking critical variability |