AI User Interview Question Generator That Avoids Leading

Updated for 2026 — use AI to draft interview questions that surface real behavior instead of confirming what you already believe — with constraints that block the most common leading-question patterns.

Most interview question drafts fail in the same way. They ask “would you use this?” instead of “what did you do last Tuesday?” They smuggle in the answer. They invite users to be polite. AI is very good at producing this kind of bad question by default — and very good at producing the opposite if you give it the right rules and the right hypothesis to test.

The task

You are planning 30-minute interviews with 6-10 users. You have a hypothesis you are trying to either kill or refine. You need a question list that surfaces actual behavior in the recent past, not opinions about an imagined future, and that does not telegraph what you want to hear.

When this is the right job for AI

  • You can write your hypothesis in one sentence. If you cannot, the interview is not ready.
  • You know who you are talking to and what their recent context is (last 2 weeks of relevant behavior).
  • You will read the draft critically and strip every question that contains a value-loaded word.
  • You can run the interview with the discipline to follow up on stories, not on opinions.

What to feed the AI

  • The hypothesis in one sentence (“power users who hit the 50-habit limit churn within 2 weeks”)
  • The user segment and how they were recruited (“paying users who have logged 40+ habits, recruited from in-app prompt”)
  • The decision the interview is informing (raise the limit? sell a tier? auto-archive?)
  • A list of taboo words you do not want in any question (“would,” “could,” “feature,” your product name as a noun)
  • 2 examples of questions you already know are bad — so AI calibrates the bar

Copy-ready prompt

You are writing user interview questions for a 30-minute call.

Hypothesis: power users who hit the 50-habit limit churn within 2 weeks because the limit forces them to delete habits they were tracking, not because the limit itself bothers them.

User segment: paying users who have logged 40+ active habits in the last 30 days. Recruited from in-app prompt with $20 gift card. They know we want to talk about how they use the app.

Decision this informs: do we raise the limit, ship a paid tier with no limit, or auto-archive habits inactive 30+ days?

Taboo words (do not appear in any question):
- "would" / "could" / "if"
- "feature" / "feedback"
- the word "limit" itself (we want to see if they bring it up unprompted)
- product name as a noun ("how do you use [App]")
- "love" / "hate" / "like"

Bad question examples (so you know the bar):
- "Would you pay for unlimited habits?" (hypothetical + sells the answer)
- "How do you feel about the 50-habit limit?" (asks for opinion, names the thing)
- "What features would you love to see next?" (everything wrong in 9 words)

Write 12 questions. Structure:

1. 3 opening questions about last week — what they actually did, no opinions.
2. 4 questions that probe behavior around the suspected churn moment — without naming the limit or hinting we are interested in it.
3. 3 questions about what they did the LAST time they stopped using a habit-tracking tool (any tool, not just ours).
4. 2 closing questions that test whether the user brings up the limit on their own. If they do not, the hypothesis is weaker.

Each question must:
- Reference a specific past time window (last week, last month, the last time you did X).
- Avoid every taboo word above.
- Not embed an assumption about why they did something.

Mark any question where you had to bend a rule with [BEND: why] so I can review.

Sample output structure

Opening (last week):

  1. Walk me through last Tuesday — when did you open the app and what did you do?
  2. Tell me about the last time you added something new to track. What was happening that day?
  3. The last time you opened the app and then closed it without doing anything — what was going on?

Behavior near the suspected churn moment: 4. Show me your active list. Talk me through what each one is for, in 15 seconds each. 5. When was the last time you removed something from this list? Walk me through that decision. 6. The last time you wanted to add something and did not — what stopped you? (do not lead — wait through silence) 7. Tell me about a time in the last month when you felt the list got hard to manage.

Last time they stopped using ANY tracking tool: 8. Think about the last tracking tool, any tool, you stopped using. What was the last week before you stopped? 9. What were you trying to track in that final week that did not work out? 10. Did you go to another tool, or just stop?

Closing: 11. If you were redesigning your tracking setup from scratch tomorrow, walk me through the first 10 minutes. 12. What is the next thing you are going to start tracking, and where will you put it?

How to refine

  • AI smuggled a taboo word in → re-prompt with “rewrite Q7 without manage — that is a value-loaded word here. Use a behavior verb.”
  • Question is hypothetical (“if you were to…”) → strict rule: “every question must reference a past time window. No if, no would.”
  • Question embeds the hypothesis → reject and ask: “rewrite Q6 so it does not assume the user wanted to add something — make it open to wanted to remove or wanted to change.
  • Too many questions, interview will overrun → cap at 10 and tag the 2 that get dropped first if time is tight.
  • AI did not surface the “what tool did they use before” angle → require a section on past tool exits.

Common mistakes

  • Asking opinion questions when behavior questions exist. “How do you feel about X” is a polite-answer machine.
  • Naming the thing you are testing in the question. If the user does not raise it unprompted, that is the signal.
  • Stacking two questions in one (“when did you last add a habit and how did that feel?”). Users only answer the easier half.
  • Using your own product vocabulary. Users do not call them “habits,” they call them “the workout thing.”
  • Skipping the “last time you quit a tool like this” question. The graveyard of prior tools is where the real reasons live.

FAQ

  • How many questions for 30 minutes? 10-12 prepared, expect to use 6-8. Follow-ups eat time and that is the point.
  • Should the user see the questions in advance? No. You want first-take answers, not rehearsed ones.
  • What if a user keeps giving opinions instead of stories? Redirect once: “tell me about the last time that happened.” If they cannot, that is data — they may not actually do it often.
  • Can AI run the interview? No. AI cannot read silence, lean in, or know when to wait. Use AI for prep and synthesis, not for the conversation.
  • How do I synthesize 8 interviews? Tag every quote with behavior verbs, then cluster by verb. Avoid clustering by “theme” — themes are your bias.

Tags: #AI writing #user-interview #Research #app-product-ops #Indie dev