Growth Experiment Prompts for Designing and Reading Tests

12 prompts to design growth experiments that won't waste a quarter — falsifiable hypothesis, sample-size math, guardrail metrics, and honest reads of flat or negative results.

Growth experiments quietly waste quarters: a team ships a “winning” test that was called at 60% confidence, or runs a multi-variable variant with no falsifiable hypothesis and then picks the metric that looks best post-hoc. These prompts force a clean design before the test goes live — and an honest read after, including the case the team flinches from most: a flat or significantly negative result. Pair with the feature prioritization prompts to decide what’s worth testing in the first place.

Best for

  • SaaS pricing experiments
  • Onboarding flow tests
  • Landing page A/B tests
  • Email subject-line tests
  • Paid-ad creative tests

1. Experiment hypothesis writer

For experiment idea {paste}, write a falsifiable hypothesis: "Changing {X} will cause {metric} to {direction + magnitude} because {mechanism}." Then state the null hypothesis explicitly. Output: how we will know we are wrong.

2. Sample-size & duration estimator

For test {paste hypothesis}, estimate sample-size and duration needed. Inputs: baseline metric {value}, expected lift {%}, traffic per day {N}. Output: sample needed, duration at current traffic, MDE achievable, when to call it.

3. Single-variable test isolation

Below: my proposed A/B test. Audit whether the variant changes ONLY 1 variable from control. If multiple variables change, propose how to split into separate tests.

{paste}

4. Pre-test guardrail metrics

For test {paste hypothesis}, identify 5 guardrail metrics that should NOT move (or only move within bounds): churn, customer-support load, page-load time, downstream conversion, error rate. Define the breach thresholds.

5. Post-test result reader

Below: experiment results. Read honestly: (a) is the lift statistically significant, (b) is it practically significant, (c) did any guardrail break, (d) did the segment effect differ. Output a ship / kill / iterate verdict.

{paste results}

6. No-result test interpretation

My A/B test ran to full sample and showed no significant lift. Help me interpret honestly: was the hypothesis wrong, was the test underpowered, was the variant too small, was the metric wrong. Suggest next test.

7. Negative-result decision

My test result was significantly negative — variant performed worse than control. Below: details. Help me extract: (a) is this learning valuable, (b) what does it suggest about the underlying belief, (c) what else needs re-testing.

{paste}

8. Pricing A/B test design

I want to test {old price} vs {new price} for {product}. Output: hypothesis, sample plan, ethical considerations (existing customers, segment isolation), the metrics that decide ship, the unwind path if it kills LTV.

9. Onboarding-flow test design

I want to test a {variant} of my onboarding. Output: hypothesis, sample plan, the activation metric, the latency considerations (activation can be measured at 7 / 14 / 28 days), how to avoid biasing the cohort.

10. Ad-creative test design

I want to test 4 ad creatives for {product}. Output: hypothesis per creative, sample plan, the primary metric (CTR / CVR / CPA), the secondary metrics that distinguish "click magnet" from "conversion driver".

11. Multi-arm prioritization

I have 8 experiment ideas and 1 traffic source. Below: each idea. Rank by ICE (impact / confidence / ease). Identify which 2 to run first and why. Note any that can run in parallel without interference.

{paste}

12. Experiment write-up template

My test just ended. Generate a 1-page write-up: hypothesis, design, sample, metrics, result, decision, the 1 surprise, what we test next. Audience: company-wide. Keep it readable for non-DS readers.

Common mistakes

  • Calling tests early because the dashboard “looks decided” — peeking inflates false positives
  • Changing more than one variable per arm so you can’t attribute the lift to anything
  • No null hypothesis stated, so any result feels like confirmation
  • Skipping guardrail metrics — variant ships, support load doubles, nobody notices for a week
  • Treating “no lift” as “no learning” instead of asking whether the hypothesis, power, or metric was wrong
  • Choosing the success metric after seeing the data, instead of locking it before the test starts

Tags: #Prompt #Product startup #KPI