AI Retention Cohort Analysis: Read the Curve, Not the Number

Updated for 2026 — use AI to read your cohort retention curve as a story (drop-off shape, week boundaries, segment splits) instead of staring at a single D30 number.

You have a retention cohort table open. Your eyes go straight to the D30 number, you compare it to last month, you feel good or bad, you close the tab. That is not analysis. The shape of the curve — where it bends, where it flattens, which cohort behaves differently — is where the actual product story lives. AI is genuinely useful at turning a wall of percentages into that story, as long as you feed it the numbers and not your conclusion.

The task

You have weekly cohort retention data (D1, D7, D14, D30, D60, D90 for the last 6-12 acquisition cohorts). You want a short written read of the curve — what changed, where the drop-off shifted, which cohort is the outlier — that you can drop into a Notion doc or share in standup without lying about what the data says.

When this is the right job for AI

  • You have the raw cohort table, not just summary numbers. AI is bad at curves it cannot see.
  • You can tell AI what shipped in each cohort window (onboarding change, pricing test, push-notification policy).
  • You will not let AI invent causes. You will ask it to flag patterns and you will assign causes from your own context.
  • The team genuinely confuses “D30 went up” with “retention improved.” A written narrative forces the distinction.

What to feed the AI

  • The cohort table as CSV or a pasted markdown table — every cell, not a summary
  • Acquisition channel mix per cohort if it varies (paid vs organic mixes break naive comparisons)
  • A timeline of what shipped per cohort week (onboarding v3, paywall test, push opt-in change)
  • The metric definition you use for “retained” (opened the app? completed core action? do not assume AI knows)
  • The 1-2 cohorts you already suspect are outliers and why

Copy-ready prompt

You are reading a weekly cohort retention table for an indie iOS app.

"Retained" = opened the app AND completed at least one core action (logging a habit) in the day window.

Cohort table (rows = acquisition week, columns = day window):
Week 2026-W10 | D1 62% | D7 38% | D14 29% | D30 22% | D60 18% | D90 16%
Week 2026-W11 | D1 64% | D7 39% | D14 28% | D30 21% | D60 17%
Week 2026-W12 | D1 71% | D7 44% | D14 31% | D30 22%
Week 2026-W13 | D1 73% | D7 46% | D14 34% | D30 24%
Week 2026-W14 | D1 72% | D7 43% | D14 30%
Week 2026-W15 | D1 70% | D7 41%
Week 2026-W16 | D1 68%

Channel mix is stable across all cohorts (70% organic search, 20% referral, 10% paid).

What shipped, by cohort:
- W10-W11: baseline.
- W12: onboarding rewrite (3 fewer steps, added "first habit in 30 seconds" path).
- W13: same onboarding + first reminder enabled by default.
- W14: same + premium paywall moved from D3 to D7.
- W15-W16: no product change.

Write a short narrative read of the curve. Structure:

1. The shape change. What changed in the curve between W10-W11 and W12+ — be specific about which day windows moved and by how many points.
2. Where the curve is flattening. The interesting question is not D1 — it is where the slope levels off. Call out the week where flattening seems to start.
3. The cohort that looks like an outlier and why. If none, say so.
4. Two hypotheses worth testing next, each tied to a specific cohort and a specific day window.
5. One thing the data does NOT tell you that you would need to answer the "why."

Rules:
- Do not invent numbers. Quote what is in the table.
- Do not assign causes the data cannot support — flag them as hypotheses.
- No "engagement improved." Name the day window and the points.
- Three to five short paragraphs.

Sample output structure

The shape of the curve shifted clearly at W12. D1 moved from 62-64% to 71-73% and held through W15, then dipped to 68% in W16. D7 moved from 38-39% to 44-46%, then drifted back to 41% by W15. So the onboarding rewrite landed an early-window lift that is partly persisting and partly decaying — the D1 lift is more durable than the D7 lift.

Where the curve is flattening: in the W10-W11 cohorts the slope between D14 and D30 is gentle (29% to 22% — about 7 points), while between D30 and D60 it flattens to 4 points and then 2 between D60 and D90. The W12-W13 cohorts have not yet hit D60, but the D14 to D30 slope already looks comparable. That suggests the lift you bought in early windows may not propagate to D60+ without a separate intervention aimed at the D14-D30 trough.

W13 looks like the standout cohort — it is the only cohort where both onboarding-v2 AND default-reminder-on were live, and its D14 (34%) is the highest in the table. W14 added the paywall shift but D14 fell back to 30%, which is suggestive but not conclusive given cohort-to-cohort noise.

Two hypotheses worth testing: (1) Default reminder on is doing more work than the onboarding rewrite — isolate by re-running with reminder-off for half the W17 cohort, watch D14. (2) The W14 paywall shift to D7 is suppressing D14 retention — split-test paywall position next sprint and read D14, not just revenue.

What the data does not tell you: whether the D7 decay between W13 and W15 is reminder fatigue (users muting), iOS notification permission changes mid-window, or external seasonality. You would need notification-permission and mute-rate cohorts to separate those.

How to refine

  • Output too vague (“retention improved”) → strict rule: “every claim names a day window and a point delta, e.g., D14 +6pts W12 vs W10.”
  • AI invents a number that is not in the table → paste the table again with explicit instruction: “quote only cells present below.”
  • Causes asserted as fact → re-prompt: “rewrite causes as hypotheses tied to specific cohorts and a falsification test.”
  • Misses the flattening question and just reports highs → require “section 2 must name where the slope levels off, by week and day window.”
  • Reads the cohorts in isolation → ask: “compare W12-W13 as a block against W10-W11 as a block, then call out within-block outliers.”

Common mistakes

  • Reading D30 in isolation and ignoring the slope. A cohort with high D1 and a steep drop is often worse than a cohort with lower D1 and a gentle slope.
  • Comparing cohorts with different channel mixes as if they are equivalent. Paid-heavy cohorts almost always have weaker D30.
  • Calling a 1-2 point cohort delta a “win.” Cohort noise on a 5,000-user-per-week app can easily move 2 points.
  • Letting the team treat the AI narrative as ground truth. The narrative is a prompt for the team conversation, not the answer.

FAQ

  • How many cohorts do I need before this is useful? Six gives you a baseline and three change-windows; fewer and you cannot tell signal from cohort noise.
  • What if my D1 is below 30%? Then the curve story is irrelevant — fix activation first. AI reading a broken curve will still produce confident text.
  • Should I share the AI narrative with the team directly? No. Read it, sanity-check every cell it quoted, rewrite it in your own voice. The thinking is the value, not the prose.
  • Can AI segment the cohort for me? Only if you feed it the segmented tables. It cannot split a cohort it has not seen.
  • What about LTV cohorts? Same template — swap day-window retention for revenue per active user, but keep the “shape, flattening, outlier” structure.

Tags: #AI writing #Retention #Cohort #app-product-ops #Indie dev