AI Gives Lists When Execution Is Needed

You asked the model to do the work; it returned an outline of how someone could do the work.

You asked the model to write a landing page, refactor a function, or draft a launch email. Instead you got a 10-bullet outline of “steps to write a landing page”, or “considerations when refactoring”, or “what to include in a launch email”. The deliverable did not arrive — only the meta-discussion about how to produce it. This is the planning-mode failure: the model treats your request as advice-seeking and returns a method rather than a finished artifact. The same model that hands you a list when you ask "how should I write the hero copy" will produce the hero copy verbatim when you ask "write the hero copy in 2 lines, max 14 words".

Common causes

Ordered by frequency.

1. “How” framing puts the model in advice mode

Classic offenders:

"How should I write the function?"
"What is the best way to structure this email?"
"How do I refactor this for readability?"

The model interprets these as method questions and returns a method. It is technically answering you — just not with the artifact.

How to spot it: your prompt’s main verb is "how", "what", "explain", or "approach" instead of "write", "produce", "output", "return".

2. Task is large; the model self-defends with planning

When the request is "build me a CRUD app" or "refactor this 800-line file", the model often emits a plan because the artifact would exceed token limits or feels too large to commit to in one shot. The plan is a safety valve.

How to spot it: output is a numbered list of phases, each labeled with a step name but no actual code/content under each.

3. Output schema implicitly requests narrative

Prompts like "explain how to do X" or "describe the steps to Y" ask for narrative, and narrative naturally takes list form. You did not ask for the artifact — you asked for the description of how to make it.

How to spot it: re-read your prompt; if the verb is descriptive (explain, describe, walk me through), the output is a list by design.

4. Earlier turns established a planning frame

If turn 1 was "let's plan the migration" and turn 5 is "now do the first step", the model often stays in planning mode and returns sub-steps instead of executing.

How to spot it: scroll up. Recent turns are about phases, milestones, or “how to approach”, not concrete artifacts.

5. The model thinks you want to learn, not to ship

Tutorial-style training data is heavily list-shaped. When the request looks like a learning question, the model defaults to teaching mode. "I want to understand X" triggers lists; "produce X" triggers artifacts.

How to spot it: the output reads like a tutorial section heading, with phrases like "first, you should consider...".

6. Ambiguous “give me X” where X could be a doc or an item

"Give me a marketing plan" is ambiguous — is X a plan document, or a list of marketing steps? Models often choose the list because lists feel safer than committing to an opinionated document.

How to spot it: output is bulleted; deliverable could have been prose, a table, or a code block.

Before you change anything

  • Confirm whether the issue happens locally, in chat UI, or via API; behavior can differ.
  • Write down the exact prompt, the model, the system prompt, and any prior turns.
  • Save the list output verbatim so you can diff it against the rewrite.
  • Note whether the task is genuinely large enough to need a plan first.
  • Check whether the prompt accidentally asked a “how” question instead of issuing an instruction.

Information to collect

  • The exact prompt text and the system prompt if any.
  • Model name and version, temperature, and conversation history.
  • The list output and the artifact you actually wanted.
  • The shape of the desired deliverable (code? prose? table? JSON?).
  • Whether you asked for a plan in an earlier turn.

Shortest fix path

Ordered by ROI.

Step 1: Rewrite “how” into an imperative verb

Planning promptExecution prompt
"How should I write the function?""Write the function. Signature: getUser(id: string): Promise<User>. Return runnable TypeScript."
"What's the best way to write this email?""Write the email. Subject line + 4-sentence body. Tone: warm, direct."
"How do I refactor this for readability?""Output the refactored code in full. Same behavior. Maximum function length: 20 lines."

The verb shift from how to write/output/produce does most of the work.

Step 2: Forbid planning vocabulary

Append:

Output rules:
- Produce the artifact directly, not a plan or list of steps.
- Do not write "first, you should..." or "here is how to approach this".
- No section headings unless the artifact itself requires them.
- Skip any preamble; start with the first line of the artifact.

Step 3: Pin the artifact type and shape

Tell the model exactly what type the output is:

Output type: TypeScript function, single file.
No comments unless explaining non-obvious logic.
Maximum 40 lines.

Or:

Output type: 3-paragraph blog intro.
Length: 80-120 words.
First sentence must contain a specific number or proper noun.

A typed return value blocks the model from substituting a list.

Step 4: For large tasks, split plan and execute into separate turns

If the task genuinely needs a plan, do it explicitly:

Turn 1: "List the 4 sub-tasks needed to refactor this file. No code yet."
Turn 2: "Execute sub-task 1 in full. Output the code, not a plan for the code."

Never combine in one prompt — the second half regresses to a list.

Step 5: Provide a result example, not a plan example

Few-shot the artifact:

Example output:
---
export async function getUser(id: string): Promise<User> {
  const row = await db.query.users.findFirst({ where: eq(users.id, id) });
  if (!row) throw new NotFoundError(`user ${id}`);
  return row;
}
---
Now write the same shape for getOrder(id: string).

The model copies the shape of the example, so an artifact example yields an artifact.

Step 6: Reset if the conversation is stuck in planning

If turns 1-5 were planning and turn 6 is execution, start a new conversation with only the relevant context. Long planning threads accumulate framing that single follow-ups cannot dislodge.

How to confirm the fix

  • The output is a finished artifact (code, prose, table, JSON), not a numbered list of steps.
  • A teammate could ship the output as-is without further generation.
  • The first 2 lines of the output are the artifact, not preamble.
  • No phrases like "first, you should consider..." or "here is how to approach...".

If still broken

  • Use a stronger model — capability-bound regressions to lists happen on smaller models.
  • Switch from chat UI to API where you can lock a JSON schema or output type.
  • Lower temperature to 0.3 for execution tasks; 0.7-1.0 invites verbose meta-commentary.
  • Use a system prompt that establishes the model as a “builder, not a tutor”.

Prevention

  • Maintain separate “plan” and “execute” prompt templates; never mix.
  • For execution prompts, banned words: approach, consider, you might, steps to.
  • Audit your last 10 prompts: count how vs write/output/produce. Aim for 80% imperative.
  • Use few-shot examples of artifacts, not of plans.
  • In tools that support structured output, define the output schema as { artifact: string } not { steps: string[] }.

Tags: #Troubleshooting #Prompt #Prompt quality #Prompt engineering