留存实验是大多数团队把忙碌当进展的地方:一次上线 6 个改动,D7 涨 2 个点,没人说得清是谁推动的。下面 15 个 Prompt 设计单变量留存测试、定义正确 cohort 窗口、标定最小可检测效应、用统计诚实读出结果。覆盖 D1 激活、D7 习惯形成、D30 持续使用、分 segment 救援、被低估的”砍功能”实验。
适合哪些场景
增长 PM、留存小组 lead、消费 App 创始人、跑应用内或邮件实验的生命周期营销。
什么时候不建议这样写 Prompt
DAU 不到 1000 不要用——样本不足以提供功效。一次性购买或纯交易产品也别用,留存不是目标。
Prompt 结构公式
留存实验 Prompt 一定带这六个要素:
- 角色:让 AI 扮演谁(资深 PM / 独立创始人 / 产品设计师 / 独立开发者 / 增长负责人)。
- 上下文:阶段(想法 / MVP / 增长 / 规模化)、团队规模、流量或 ARR、平台(web / iOS / Android)、受众、限制。
- 目标:一个具体交付物——一段 PRD、一组用户故事、一个实验设计、一篇上线公告。
- 限制:时间线(本 sprint / 本季度)、要砍的范围、不能动的东西(现有流程、计费、合规)。
- 输出格式:表格、清单、可贴 ticket 的 JSON、或带标签的段落,能直接粘到 Linear / Notion / Jira。
- 示例 / 信号:1-2 份你欣赏的参考或竞品、加 1 个想避开的反例。
这套 Prompt 适合用在哪
- D1 激活提升设计
- D7 习惯回路实验
- D30 持续参与押注
- 分 segment 留存救援
- 季度留存押注规划
15 个可直接复制的 Prompt 模板
1. 单变量 D1 提升
默认。强制单变量纪律。
You are a growth PM. Design a D1 retention experiment for {product}: (1) hypothesis (specific behavior change), (2) single variable manipulated, (3) control vs variant, (4) target lift + minimum detectable effect, (5) sample size, (6) duration, (7) primary metric (D1 retention), (8) 3 guardrail metrics, (9) kill criteria. Banned: bundling multiple changes.
Context: {product, current D1, segment, hypothesized cause}
可替换变量: product、current D1、segment、假设原因
优化建议: 假设模糊时追加:“Rewrite the hypothesis in the form: ‘If we change X for users who Y, D1 retention will increase from A% to B% because Z.‘“
2. D7 习惯回路实验
Design a D7 retention experiment focused on habit formation. Hypothesis must name: trigger (what brings them back), action (what they do), reward (what they get), investment (what makes the loop sticky). Specify the variable changed in one layer of the loop, with metric definition and guardrails. Duration: at least 21 days.
3. D30 持续参与
Design a D30 retention experiment. Hypothesis: which user behavior in week 1 predicts D30 retention, and what nudge increases that behavior. Specify the cohort definition, the predictor metric, the intervention, the success threshold. Note: D30 tests need at least 6 weeks of data and large samples.
4. Cohort 定义审计
Below is my proposed cohort for a retention test. Audit it: (1) is the cohort window correct (e.g., new users in week of Aug 5), (2) is the comparison cohort matched, (3) are external factors controlled (release dates, marketing campaigns), (4) is the cohort size sufficient. Recommend the smallest fix.
Cohort def: {paste}
5. 激活事件重定义
For {product}, define the activation event that best predicts D7 retention. Steps: (1) list 5 candidate events, (2) describe how to test each as predictor, (3) recommend the most predictive one with reasoning. End with the cohort split for the next test.
6. 分 segment 留存救援
D7 retention for {segment} is 30% below the global average. Design 3 segment-specific experiments to close the gap. For each: hypothesis, variable, expected lift, why this works for this segment specifically. Mark which one to run first.
7. 推送频次测试
Design a notification-cadence experiment. Variants: 0 / 1 / 3 / 7 push notifications per week in the first 14 days. Define: variant assignment, primary metric (D14 retention), guardrails (opt-out rate, complaint volume, app rating), winner-call criteria.
8. 砍 onboarding 步骤反测
Design a counter-experiment where we REMOVE an onboarding step ({specific step}) for half the users. Hypothesis: completion rate rises, D1 retention rises, but {feature adoption} drops. Define how to measure each, and how long to wait before calling the result.
9. 读出模板(统计诚实)
Below is the result of a retention experiment. Write the read-out: (1) hypothesis tested, (2) sample size achieved, (3) result with confidence interval, (4) whether it crossed the minimum detectable effect, (5) guardrail movement, (6) ship / kill / iterate decision, (7) what we learned even if it failed.
Result data: {paste}
10. 实验 pre-mortem
Before launching this experiment, run a pre-mortem: 5 reasons it could produce a misleading result (selection bias, seasonality, contamination, ceiling effect, novelty effect). For each: how to detect, how to mitigate. End with the kill criterion that should force an immediate stop.
11. 季度留存押注 backlog
For {product} with current retention curve {paste}, produce a backlog of 12 retention experiments for next quarter. For each: hypothesis, target metric (D1/D7/D30), estimated effort (S/M/L), expected lift (small/medium/large). Sort by impact / effort.
12. 邮件留存测试
Design an email-based retention experiment for {product}: (1) trigger condition (e.g., 3 days since last login), (2) email variants (control = no email, variant A = soft nudge, variant B = personalized recommendation), (3) success metric (return rate within 7 days), (4) sample size, (5) what would invalidate the result.
13. “砍功能”留存测试
We suspect {feature} is hurting retention. Design a counter-test: for a random 5% of users, hide the feature entirely. Measure D7 / D14 retention vs control. Define the threshold at which we kill the feature for everyone.
14. 留存曲线诊断
Below is our retention curve (D0 to D60). Diagnose: where does the steepest drop happen, what behavior change correlates with the cliff, what segment is most affected. Recommend the next experiment to test the diagnosis.
Curve: {paste}
15. 多实验依赖图
We have 5 retention experiments in flight. Identify: which can run in parallel safely, which contaminate each other, which require sequencing. Output a dependency graph and a recommended schedule for the next 8 weeks.
Experiments: {paste}
容易踩的坑
- 一次实验改 3 个以上变量——无法归因。
- 一个点提升就宣布胜利,没看显著性。
- 忘记 guardrails——D1 上升但流失率飙升是亏的。
- Cohort 窗口太短——D30 至少要 6+ 周,没有捷径。
- 新用户和老用户 cohort 混在同一读出里。
- 忽视新奇效应——4 周实验可能掩盖第 6 周的回落。
- 一次实验就拍板,不做复现。
优化技巧
- 开跑前用”如果 X 则 Y by Z%,因为 W”句式定假设。
- 至少保留 30% 用户作为永不触动的对照组,用于检测交叉污染。
- 读出时给置信区间,不只给点估计。
- 每个实验都配 kill criterion——否则会无限期跑下去。
- 任何”胜利”在新 cohort 复现一遍再 100% 发布。
- 提前定最小可检测效应;事后再定是统计作弊。
- 在 /retention-experiments 文档里累积学习——多数团队忘了上季度测了什么。
FAQ
- 需要多大样本?: 取决于基线和最小可检测效应。常用经验:在 40% 基线想检测 5 个点提升,每组 1000+ 起。
- 能同时跑多个留存测试吗?: 只能在不同界面且双随机时。用模板 15 画依赖图。
- D7 还是 D30 更重要?: 过 PMF 后 D7 预测 D30。在那之前,D1 最有用。
- 怎么区分新奇效应和真实提升?: 把测试时长延长一倍;提升消退就是新奇。
- 样本太小怎么办?: 跨时间合并 cohort(产品变快时风险)或预先承诺只读方向并记录注意事项。