How long should an interview be?

30-45 minutes is the sweet spot. Under 30 you cannot get past surface answers; over 45 fatigue degrades the answers from both sides.

Should I record the interview?

Yes, with explicit consent. Transcripts let you cluster patterns across interviews — see the linked user persona article for the pattern-matching workflow. Without transcripts, the 8th interview retroactively colors your memory of the 1st.

How do I find interviewees?

Cold outreach with a 10-minute frame ("I am trying to learn how teams handle X; I would love 30 minutes — no sales pitch") converts at 5-10% on LinkedIn for B2B. Warm intros convert at 50%+; spend the first week on intros before going cold.

What if the interviewee tries to design the product for me?

Politely steer back: "That is helpful — let me come back to it. First, walk me through the last time you actually faced this problem." Interviewees love designing; you need their behavior, not their roadmap.

When do I stop interviewing and start building?

When you hit saturation — the next interview adds less than the previous one and the last 3-4 mostly confirm what you already heard. For a tightly defined segment that is usually around 10-15 interviews. If interview 15 is still surprising you, your segment is too broad — split it and run each half separately.

Which AI model writes the best discovery script?

As of June 2026, Claude Opus 4.7 follows the "no hypotheticals" rule most reliably and self-audits when asked; GPT-5.5 and Gemini 3.1 Pro are equally capable. The free tier of any one of them handles a single script. The bigger payoff from AI comes after the interviews — clustering transcripts into patterns — for which a tool like Dovetail or Insight7 beats raw chat.

AI Use Cases

Write Customer Discovery Interview Questions With AI

Generate Mom-Test-style interview questions that surface real past behavior — not 'would you use a tool that...' opinion bait that produces interview transcripts full of polite lies.

Published: May 17, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

TL;DR: Feed an AI model your validated hypothesis, your interview segment, and one hard rule — every question must be answerable with a past event, never a hypothetical — and it writes a Mom Test-style discovery script with built-in “why” probes in about 30 seconds. The copy-ready prompt below does exactly this. AI writes the script; you still run the interview. Use any frontier model (GPT-5.5, Claude Opus 4.7, or Gemini 3.1 Pro, as of June 2026) for the script, then a transcription tool like Otter.ai (free tier, paid from $8.33/mo) to record so you can pattern-match across interviews.

The task

You have 8 customer discovery interviews lined up next week. The product idea has been bouncing in your head for two months; you finally talked your co-founder into validating before building. Your first draft of the question list is a disaster — half of it is “would you pay for X?” and “do you think Y would help?” You already know how these go: interviewees politely confirm whatever you suggest, you walk away with eight “yes” votes, you build the thing, and nobody uses it. You want questions that surface what people actually did the last time they faced this problem, because past behavior is the only signal that predicts future buying.

This is the core lesson of Rob Fitzpatrick’s The Mom Test: if you ask your mom whether your business is a good idea, she says yes — because she loves you. The fix is three rules: talk about their life, not your idea; ask about specifics in the past, not hypotheticals about the future; listen more than you talk. The “yes” you want to avoid has a name in research — acquiescence bias, the tendency of people to agree with whatever you suggest to be cooperative. Hypothetical questions (“would you use X?”) trigger it almost every time.

Where AI helps — and where it does not

AI is genuinely good at writing Mom Test-style questions — anchored in specific past events, not hypotheticals — once you tell it the rule. It is also good at producing the 2-3 follow-up “why” probes per question so you do not run dry at minute 12. Where AI fails: running the interview. The model writes the script; you still have to ask “why” three times in a row when the interviewee gives you a surface answer, sit through the silence after a hard question, and resist pitching your product when they describe pain you can solve. The script is half the battle; the discipline in the room is the other half.

A common failure mode: even with the rule in the prompt, the model occasionally slips a “would you” question in — usually disguised as “how important would it be to you if…” Audit each question after generation. Any sentence containing “would,” “could,” “if you had,” or “do you think” is opinion-bait and gets rewritten.

Which model to use

Any current frontier model writes a solid script. As of June 2026, Claude Opus 4.7 (in the $20 Claude Pro plan) tends to follow the “no hypotheticals” rule most consistently and flags its own slips when asked. GPT-5.5 (ChatGPT Plus, $20/mo) is just as capable and slightly faster on the Instant setting. Gemini 3.1 Pro (Google AI Pro, $19.99/mo) works fine and is convenient if your interview notes already live in Google Docs. The free tier of any of these is enough for a one-off script. For the part that comes after the interview — transcribing and clustering eight recordings into patterns — a dedicated tool earns its keep: Otter.ai for transcription (free plan, paid from $8.33/mo), and Dovetail or Insight7 for tagging themes at scale.

What to feed the AI

The product idea or hypothesis you are validating — not a description of the product, but the specific belief you are testing
The specific decision you would change based on the interviews — “if 6 of 8 say X, I will build feature Y; if 6 of 8 say Z, I will not”
Who you are interviewing — role, company size, how they currently solve the problem (with what tool, with what process)
The 2-3 prior assumptions you suspect are wrong — these become the questions designed to expose your own blind spots
The 1-2 specific behaviors you most want to verify (e.g., “do they actually look at the dashboard daily, or do they say they do?”)
The interview length — 30 vs 45 vs 60 minutes changes what fits in the script
Whether you can offer an incentive — and what kind (gift card, early access, custom report)
Hard interview no-go list — words you will not use in the room (“AI,” “platform,” “solution”), because they bias the answers

Copy-ready prompt

Write a customer discovery interview script using Mom Test principles.

Hypothesis I am validating: {specific belief, not product description}
Decision I will change based on results: {if X then Y}
Interviewee: {role, company size, current solution + tools}
Suspected wrong assumptions of mine: {2-3}
Specific behaviors to verify: {1-2}
Interview length: {minutes}
Hard banned words (will bias the answer): {AI, platform, solution, etc.}

Return:
1) Opening — disarm and frame (under 50 words, no product description). Lead with "I am trying to learn about your work; this is not a sales call."
2) Past-behavior questions (5 questions). Each question anchors in a specific past event ("walk me through the last time you..."). For each, include 2 follow-up "why" probes I can use if they give a surface answer.
3) Constraint questions (3). Surface budget, time, authority, and approval-process limits. Past-tense only.
4) Avoidance / failure questions (2). "What have you tried that did not work, and what made you give up on it?" The answers reveal whether the problem is real enough to drive abandoned solutions.
5) Close (under 50 words). Ask for an intro to one peer who has this problem, and consent for a 15-minute follow-up call if I want to test a prototype later.

Hard rules:
- ZERO "would you," "could you," "do you think," "how important would it be," or "if you had" questions. If any slip in, flag them and rewrite.
- Do NOT describe my product. The first time my product comes up should be at the close, only if they ask.
- Every question should be answerable with a past event, not an opinion.

Shorter variant — single-question deep-probe

Below is the surface answer an interviewee gave to this question: {paste question + answer}.
Write 3 follow-up "why" probes that would surface what they actually did vs what they said.
Then write the 1 follow-up question that, if they answer it well, lets me cluster their answer with other interviews into a real pattern.

Sample output

A useful past-behavior question: “Walk me through the last time you needed to write a project status update for your manager. What did you actually do — what tool did you open first, who did you ask, what part took the longest?” Compare to the trap version: “Would you use an AI tool that drafts status updates for you?” — the second produces yeses that do not survive contact with money.

A useful constraint question: “The last time you bought a tool that touched your status-update workflow, what was the budget approval process? Who signed off, how long did it take, what almost killed it?” This reveals whether budget exists, who owns it, and what stopped the last attempt.

A useful avoidance question: “What is something you tried in the last 18 months to solve the status-update problem, that you gave up on? What made you give up?” Abandoned solutions are gold; they prove the problem is real (someone tried to solve it) and reveal why competing solutions failed.

A useful follow-up “why” probe: First answer: “I just write it in Slack.” Probe 1: “What did you write in Slack last Friday — can you pull it up and read me the first sentence?” Probe 2: “What did you almost write, then changed?” Probe 3: “What would have made the Friday version 10 minutes faster?”

How to refine

If the model slips hypotheticals back in: “Audit every question for ‘would,’ ‘could,’ ‘do you think,’ ‘if you had.’ Rewrite each as a past-behavior anchor. If a question cannot be rewritten as past-behavior, delete it.”
If the script feels too short for 45 minutes: “Add 3 more past-behavior questions in the area I am most uncertain about: {area}. Pair each with 2 ‘why’ probes.”
If the past-behavior questions are too abstract: “Each past-behavior question must name (a) the specific event, (b) the time window (‘last week,’ ‘most recent time,’ ‘when you started this role’), and (c) the artifact the interviewee can pull up while answering.”
If the constraint questions feel preachy: “Constraint questions must be specific to past purchases or decisions, not philosophical. ‘The last tool you bought’ beats ‘how do you decide on tools.’”
If the close pitches my product: “The close asks for an intro and consent to follow up. It does NOT describe my product unless they ask ‘so what are you building.’ If they ask, give one sentence and stop.”

Common mistakes

Leading questions: interviewees politely confirm whatever you suggest, and you walk away with 8 false yeses that turn into 8 unused product features.
Talking about your product in the first 10 minutes: they will frame every subsequent answer around your product, and you learn nothing about how they actually behave when you are not in the room.
No constraint questions: you learn what they want but not what they would actually pay for; want is cheap, paid behavior is the signal.
Skipping the “why” probes: surface answers are worthless. The third “why” is where the real motivation surfaces, every time.
Asking about the future: “would you,” “could you,” “do you think” all produce opinion data; the only valid future-tense question is “when can we follow up to test a prototype.”
Pitching at the close: the close is for the intro ask and the follow-up consent; pitching at the close trains the interviewee to evaluate, not to share.
No transcript or recording: you forget 60% of what was said within 24 hours; without a transcript you cannot pattern-match across the 8 interviews.
Sample size of one: N=1 looks promising; N=3 starts a pattern. For a tightly defined segment, plan on roughly 10-15 interviews — that is where most discovery research hits saturation (the last 3-4 interviews stop telling you anything new). Stop at 3 and you are guessing.

FAQ

How long should an interview be?: 30-45 minutes is the sweet spot. Under 30 you cannot get past surface answers; over 45 fatigue degrades the answers from both sides.
Should I record the interview?: Yes, with explicit consent. Transcripts let you cluster patterns across interviews — see the linked user persona article for the pattern-matching workflow. Without transcripts, the 8th interview retroactively colors your memory of the 1st.
How do I find interviewees?: Cold outreach with a 10-minute frame (“I am trying to learn how teams handle X; I would love 30 minutes — no sales pitch”) converts at 5-10% on LinkedIn for B2B. Warm intros convert at 50%+; spend the first week on intros before going cold.
What if the interviewee tries to design the product for me?: Politely steer back: “That is helpful — let me come back to it. First, walk me through the last time you actually faced this problem.” Interviewees love designing; you need their behavior, not their roadmap.
When do I stop interviewing and start building?: When you hit saturation — the next interview adds less than the previous one and the last 3-4 mostly confirm what you already heard. For a tightly defined segment that is usually around 10-15 interviews. If interview 15 is still surprising you, your segment is too broad — split it and run each half separately.
Which AI model writes the best discovery script?: As of June 2026, Claude Opus 4.7 follows the “no hypotheticals” rule most reliably and self-audits when asked; GPT-5.5 and Gemini 3.1 Pro are equally capable. The free tier of any one of them handles a single script. The bigger payoff from AI comes after the interviews — clustering transcripts into patterns — for which a tool like Dovetail or Insight7 beats raw chat.

Tags: #AI writing #Product #Workflow #Customer discovery #Interview

The task

Where AI helps — and where it does not

Which model to use

What to feed the AI

Copy-ready prompt

Shorter variant — single-question deep-probe

Sample output

How to refine

Common mistakes

FAQ

Related

Related Articles

How to Write App Onboarding Copy with AI: Screens, Headlines, CTAs

How to Reply to App Store Reviews With AI Without Sounding Like a Bot

Build a Competitor Feature Matrix With AI

How to Use AI for Feature Prioritization: RICE Scoring That Survives Stakeholder Review

AI Landing Page Section Order: Plan the Page Before You Write

Build a Launch-Day Checklist With AI: Tech, Content, Comms, Analytics