Write User Stories With AI: Feature Idea to INVEST-Ready Stories

Turn a PM's one-paragraph feature idea into estimable user stories with Given/When/Then acceptance criteria, using a tested prompt and the right model (as of June 2026).

Published: May 17, 2026 Updated: Jun 09, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

TL;DR

A PM hands you a paragraph. Engineering wants user stories with acceptance criteria before they will estimate. Feed the paragraph plus 1-3 personas into Claude Opus 4.7 or GPT-5.5 with the prompt below, and you get a first draft of 5-9 stories with Given/When/Then criteria in under a minute. Then you do the part the model cannot: check each story against INVEST, strike invented personas, and split anything over a ~3-day estimate. The model removes the blank-page tax; the judgment stays yours.

Why the blank page is the expensive part

Writing the 30th story set for a CRUD screen is not hard. It is tedious, and tedium makes you start late and skim edge cases. That is exactly the shape of work current LLMs handle well: a recognizable format, a clear structure, and a draft you react to rather than invent.

Use AI for this when:

You write 3+ story sets a week and the format repeats.
The feature is a known pattern (CRUD, notifications, search, payments, settings, onboarding).
You want a complete first draft to critique, not a from-scratch brainstorm.
You need consistent story format across squads or contractors.

Skip AI, or use it only as a sketch pad, when:

The UX is novel and the story emerges from prototyping. Write the story after the prototype, not before.
The feature is regulated (medical, financial) and acceptance criteria carry legal weight. A domain SME reviews every line.
The “user” is another engineer (platform, tech debt). The persona-driven story format does not fit, and forcing it produces noise.

What good looks like: INVEST and Given/When/Then

Before you judge any draft, hold it against the two standards your team already grades stories on.

INVEST is the checklist for a story that survives planning:

Letter	Test	Fast failure signal
Independent	Can ship without waiting on a sibling story	”Story 4 needs story 7 first”
Negotiable	Describes the need, not a locked solution	Implementation details baked into the title
Valuable	A real user gets something	”As a system, I want…”
Estimable	Team can size it	Too vague to point
Small	Fits one sprint, ideally ≤ 3 days	One giant story
Testable	Has clear pass/fail criteria	No acceptance criteria at all

Given/When/Then (the BDD format from Cucumber/Gherkin) is how you make a story testable. Aim for 3-5 criteria per story:

Given I am a logged-in subscriber on the billing page,
When I click "Pause Subscription" and confirm,
Then my status changes to "Paused",
And I keep access until the current period ends,
And I receive a confirmation email with reactivation steps.

If you cannot picture clicking through a criterion, it is too vague to ship.

What to feed the model

Garbage in, generic out. The draft is only as specific as your input packet:

The feature idea in 3-5 sentences.
1-3 real personas: role, what they care about, what tool they use today.
Known edge cases and constraints (auth, permissions, data sources, rate limits).
Your “done” definition: what “shipped” means for your team.
1-2 of your own past stories as a style sample, so the output matches house format.

That last item matters most. Paste a real story your team wrote and the model mirrors your voice, granularity, and criteria style far better than any instruction can describe.

The prompt

Tested on Claude Opus 4.7 and GPT-5.5 (June 2026). Replace every [bracket] with your specifics.

You are a senior product manager writing user stories an engineering
team can estimate. Use INVEST and Given/When/Then.

Feature: [3-5 sentence description]
Personas: [list, each with their goal]
Edge cases to cover: [list]
Constraints: [auth, permissions, data sources, etc.]
Our house style (match this): [paste 1 real past story]

For each story output:
- Title: short, action-oriented.
- "As a [persona], I want [action], so that [outcome]."
- 3-5 acceptance criteria as Given/When/Then. Cover the happy path
  plus at least 2 edge cases (e.g. empty state, permission denied).
- Out of scope: one line on what this story does NOT cover.

Then output:
- 5-9 stories ordered by user value, each independently shippable.
- A "Not covered" list at the end: things you noticed but excluded,
  so the team can decide whether to add them.

Do not invent personas or features I did not give you. If you need a
detail, list it under "Open questions" instead of guessing.

The last paragraph is the load-bearing line. Without an explicit “do not invent, ask instead” instruction, models pad missing context with plausible fiction, and you spend your review time deleting personas that do not exist.

Which model to use

Task	Best fit (June 2026)	Why
Structured story drafts, strict format	Claude Opus 4.7 or Sonnet 4.6	Treats “exactly 5 criteria, no pricing” as binding constraints; 1M-token context swallows a full PRD
Inside Jira, pulling from your Confluence rules	Atlassian Rovo / Intelligence	Drafts stories and expands criteria from docs you already wrote; bundled with Jira Premium (~$13.53/user/mo annual)
Quick draft you will paste elsewhere	GPT-5.5 (ChatGPT)	Fast, strong general drafting; Plus is $20/mo

For pure document generation that must hold a long PRD in context and follow format rules precisely, Claude is the steadier choice; Sonnet 4.6 is cheaper at $3/$15 per 1M tokens (in/out) versus Opus 4.7 at $5/$25 and is usually enough for story drafts. If your stories live in Jira, Rovo writing directly against your Confluence business rules saves a copy-paste round trip.

Splitting a story that is too big

The single most common AI failure here is one giant story that should have been three. When a draft story will not fit in ~3 days, split it with SPIDR (Mike Cohn):

Spike — carve out a research task first if the team does not yet know how to build it.
Paths — different routes through the feature (pay by card vs. PayPal vs. stored credit).
Interface — one platform or input method at a time (web first, mobile next).
Data — support a subset of data first (US addresses now, international later).
Rules — relax a business rule for v1 (skip bulk-discount logic, add it later).

Ask the model directly: “Story 3 is larger than 3 days. Split it using SPIDR into vertical slices, each independently shippable.” Vertical slices (UI through to data) ship value; horizontal slices (“build the API layer”) do not.

How to check the draft in five minutes

Read each criterion aloud. Cannot picture the click path? Too vague.
Run INVEST on each story using the table above. The two it fails most: Independent and Small.
Hunt for invented personas or features. Strike anything not in your input.
Check the “Not covered” and “Open questions” lists. This is where the real planning conversation lives.
Confirm every story has 3-5 criteria with at least one unhappy path.

Common mistakes

Stories with no acceptance criteria. They bounce at planning. Make G/W/T non-negotiable in the prompt.
One oversized story. Cap at ~3 days and split with SPIDR.
Skipping edge cases. The model only covers the ones you name, so list empty states, permission-denied, and timeout paths explicitly.
Trusting AI story points. It will guess; estimation is your team’s calibration, not the model’s.

Keep the prompt sharp

After each sprint, note which stories caused rework or a QA escape, then feed the pattern back: “Stories like X kept missing the permission-denied path. Always add a Given/When/Then for unauthorized access.” Over a few sprints your prompt encodes your team’s hard-won lessons, and the drafts need less surgery.

FAQ

Should the AI write tests too? It can turn acceptance criteria into test outlines or Gherkin scenarios, which is useful QA scaffolding. It is not a substitute for tests an engineer writes and runs.

How many stories per feature? Usually 5-9. If you get 15, the feature is an epic; split it before estimating.

Can it estimate story points? It can guess, but do not trust it. Points reflect your team’s velocity and calibration, which the model has no access to.

Claude or ChatGPT for this? For strict-format, long-context story drafting (June 2026), Claude Opus 4.7 or Sonnet 4.6 follows constraints more reliably. If your stories already live in Jira, Atlassian Rovo drafting from your Confluence rules saves the most steps.

Will engineers trust AI-drafted stories? Only if you edit them. Ship a raw draft and you lose credibility fast. Treat the output as a first draft you own, not a finished artifact.

Tags: #Workflow

TL;DR

Why the blank page is the expensive part

What good looks like: INVEST and Given/When/Then

What to feed the model

The prompt

Which model to use

Splitting a story that is too big

How to check the draft in five minutes

Common mistakes

Keep the prompt sharp

FAQ

Related

Related Articles

How to Write App Onboarding Copy with AI: Screens, Headlines, CTAs

How to Reply to App Store Reviews With AI Without Sounding Like a Bot

Build a Competitor Feature Matrix With AI

Write Customer Discovery Interview Questions With AI

How to Use AI for Feature Prioritization: RICE Scoring That Survives Stakeholder Review

AI Landing Page Section Order: Plan the Page Before You Write