Prompt Asks for Too Many Tasks at Once

You stacked five tasks in one prompt; the model did one well, one badly, and partially answered three.

You wrote one prompt asking the model to: (1) summarize the customer email, (2) classify sentiment, (3) propose a reply, (4) flag for escalation if needed, and (5) draft an internal Slack message. The output handles task 1 well, task 2 with a generic answer, skips task 4 entirely, and writes a half-finished reply for task 3. Task 5 never gets touched. You re-prompt “do all 5” — same pattern, slightly different gaps. Multi-task prompts behave like time-shared CPUs with no scheduler: the model gives most of its budget to the first task, ramps down on later ones, and runs out of attention before finishing.

This page walks through why batched prompts produce inconsistent results and how to split tasks into parallel or sequential prompts with explicit per-task success criteria.

Common causes

1. Tasks stacked to save typing

You batched because writing 5 prompts felt wasteful. The result is one prompt that does 5 things poorly instead of 5 prompts that each do one thing well.

How to spot it: prompt has 3+ numbered tasks.

2. Token budget runs out

The model’s output budget is finite. 5 tasks in one output means each gets 1/5 of the budget, even if some need 1/2 to be done well.

How to spot it: later tasks are truncated or skipped.

3. Earlier tasks change the model’s “state”

After answering task 1 in formal voice, task 2 inherits that voice even if it would be better in a different register. The model is path-dependent within a single response.

How to spot it: later tasks share register / framing of earlier ones inappropriately.

4. No per-task success criteria

You said “do all 5” without saying what success looks like for each. The model picks whichever is easiest to satisfy and skips the rest.

How to spot it: prompt has 1 success criterion shared across all tasks.

5. Tasks have hidden dependencies

Task 3 depends on task 2’s output. The model handles them in order but task 2’s output is suboptimal so task 3 cascades errors.

How to spot it: failure of one task corrupts the next.

Before you change anything

  • List every task you stacked. Count them.
  • Identify which tasks are truly independent vs dependent.
  • For each, define what success looks like.
  • Decide whether to send parallel (independent) or sequential (dependent) prompts.
  • Plan whether each task can fit in its own turn or whether they share a system prompt + per-turn task.

Information to collect

  • The current multi-task prompt.
  • The partial / inconsistent output.
  • The 5 (or however many) tasks broken out.
  • Per-task success criteria.
  • Model and any system prompt.

Shortest path to fix

Step 1: List the tasks; default to one prompt per task

Task 1: Summarize the email.
Task 2: Classify sentiment.
Task 3: Draft reply.
Task 4: Flag escalation.
Task 5: Internal Slack message.

Default: 5 prompts. Reason for batching: only if dependencies require it or token cost matters more than quality.

Step 2: For independent tasks, run in parallel

If your platform supports it, run 5 API calls in parallel. Each gets full attention. Total latency is the max of the 5, not sum.

results = await asyncio.gather(
  call_model(prompt_1),
  call_model(prompt_2),
  ...
)

Step 3: For dependent tasks, chain sequentially with explicit handoff

Pass 1: Summarize email. Output: <summary>
Pass 2: Given <summary>, classify sentiment. Output: <sentiment>
Pass 3: Given <summary> and <sentiment>, draft reply. Output: <reply>
Pass 4: Given <summary>, <sentiment>, decide escalation. Output: <bool>
Pass 5: Given all of the above, write Slack message. Output: <message>

Sequential passes preserve quality and make dependencies explicit.

Step 4: If you must batch, label tasks clearly and give per-task success

Process the email below. For EACH numbered task, output a labeled section.

Task 1: SUMMARY (max 30 words)
Task 2: SENTIMENT (positive | neutral | negative | frustrated)
Task 3: REPLY_DRAFT (50-100 words, second person, no emoji)
Task 4: ESCALATION (yes/no + 1-sentence reason)
Task 5: SLACK_MSG (under 40 words, casual)

Output:
TASK 1: ...
TASK 2: ...
TASK 3: ...
TASK 4: ...
TASK 5: ...

Structured per-task output prevents the “runs out of energy” pattern.

Step 5: Use a planner / executor split for complex multi-task work

Planner prompt: Given input X, produce a 5-step plan with the schema
                for each step's output.

Then for each step in the plan, execute it as a separate prompt.

This decomposes complex work into well-scoped sub-prompts.

Step 6: Audit for completion

After running, programmatically check each task’s output is present and well-formed. If any task missed, re-run just that one.

How to confirm the fix

  • Every task has a complete output, not just task 1.
  • Each task’s output passes its success criterion.
  • The depth of task 5 matches the depth of task 1.
  • A teammate reviewing the output cannot tell which task ran first.
  • Total quality is higher than the batched prompt at the cost of slightly more calls.

If it still fails

  1. The tasks may have more dependencies than you thought — sequence them.
  2. The model may need fewer total tasks — drop the lowest-value one.
  3. Use a more capable model for the most important task and a cheaper one for the others.
  4. For very high-stakes work, never batch — always 1 prompt per task.

Prevention

  • Default rule: 1 prompt = 1 task.
  • For batched workflows, use planner + executor pattern with explicit sub-prompts.
  • Keep a personal anti-pattern list: “did you stack tasks?” — check before sending.
  • Audit production pipelines: any prompt with 3+ tasks is a risk.
  • Treat token-saving via batching as a smell unless quality is empirically equivalent.
  • Build the habit of asking “if this fails halfway, what is recoverable?” — batched prompts fail half-completely.

Tags: #Troubleshooting #Prompt #Prompt quality #Prompt engineering