R0023/2026-03-25/Q003/SRC02
Wharton GAIL Report 1: demonstrates inherent variability that complicates degradation detection
Source
| Field |
Value |
| Title |
Prompting Science Report 1: Prompt Engineering is Complicated and Contingent |
| Publisher |
SSRN / Wharton Generative AI Labs |
| Author(s) |
Lennart Meincke, Ethan Mollick, Lilach Mollick, Dan Shapiro |
| Date |
2025-03-04 |
| URL |
https://gail.wharton.upenn.edu/research-and-insights/tech-report-prompt-engineering-is-complicated-and-contingent/ |
| Type |
Research paper (technical report) |
Summary
| Dimension |
Rating |
| Reliability |
High |
| Relevance |
Medium |
| Bias: Missing data |
Low risk |
| Bias: Measurement |
Low risk |
| Bias: Selective reporting |
Low risk |
| Bias: Randomization |
N/A — not an RCT |
| Bias: Protocol deviation |
N/A — not an RCT |
| Bias: COI/Funding |
Low risk |
Rationale
| Dimension |
Rationale |
| Reliability |
Same rigorous methodology as Q001 assessment. |
| Relevance |
Medium for Q003 specifically — demonstrates that even within a single model version, identical prompts produce inconsistent results. This is relevant because it shows that detecting "degradation" requires accounting for baseline variability. |
| Bias flags |
Low risk across the board. |
| Evidence ID |
Summary |
| SRC02-E01 |
Same model, same prompt produces inconsistent results — baseline variability complicates degradation detection |