Prompt Asks for 10 Items, Model Returns 3 and Stops

You asked for 10 ideas, the model gave 3 and trailed off. Or it filled 10 slots but the last 4 are filler. Why list-length requests under-deliver and how to actually get N items.

You asked for “10 marketing taglines for a noise-cancelling headphone.” The model wrote 3 good ones, then “and here are some more ideas: catchy, focused, premium.” and stopped. Or it dutifully filled 10 lines but items 7-10 are paraphrases of items 1-3. Or it returned 10 great items but then started item 11 because it didn’t know when to stop. This is one of the most predictable LLM failure modes — a list-length request without scaffolding gets the model “running out of distinct ideas” partway through, and the model either truncates honestly, pads, or overshoots.

The fix is not “add more examples.” It’s redesigning the prompt so that 10 items become 10 separate generation steps, each with its own constraint.

Common causes

1. The space of valid answers is smaller than N

You asked for “10 unique benefits of drinking water.” There are maybe 5 truly distinct benefits. Items 6-10 will be rephrases or fluff. Not the model’s fault.

How to spot it: Ask yourself, before reading the model’s output, “if I wrote this list myself, could I genuinely think of 10 distinct items?” If no, the prompt over-asks.

2. No diversity constraint

“List 10 startup ideas.” The model gravitates to the highest-probability ideas (AI for X, marketplace for Y) and runs out fast. Without “diverse across industries” or “each from a different sector,” items converge.

How to spot it: Look at the model’s items. If 7 of the 10 are SaaS or marketplace, you didn’t constrain diversity.

3. Model hit a soft length budget

Even with max_tokens=4000, models develop their own “this answer feels complete” heuristic around 400-600 tokens. After that, they wind down regardless of how much was asked for.

How to spot it: Count tokens in the response. If it always stops around the same length regardless of N, soft budget.

4. No item template — quality drift

“List 10 product ideas” with no structure: model writes 3 detailed ideas, then becomes terse for items 4-10. Without a per-item template, depth varies.

How to spot it: Items 1-3 are 3 paragraphs each; items 4-10 are one sentence.

5. Items are too coupled — model self-deduplicates

After writing “1. Bluetooth connectivity,” the model is reluctant to write another wireless-themed item because it pattern-matches “I already covered wireless.” Without explicit permission to be similar, list collapses.

How to spot it: The model’s items 7-10 take wider semantic jumps than items 1-3 — it’s stretching to stay “unique.”

6. Stop sequence triggered early

Your API call includes a stop sequence like "\n\n" or "###". The model emits one of these between items and the API truncates.

How to spot it: Output ends mid-list. Check finish_reason in the API response — stop means a stop sequence fired, not natural completion.

7. List request inside long context window

The model has 8000 tokens of context and your “give me 10 ideas” is the last 20. Models systematically under-attend to brief instructions buried in long context.

How to spot it: Same prompt in isolation works; in your real pipeline it under-delivers.

Shortest path to fix

Step 1: Validate the prompt asks for a plausible N

If you can’t easily list 10 yourself, ask for 5. Cap at the realistic count.

Step 2: Add a diversity dimension

Don’t just say “10 different.” Specify the axis:

List 10 startup ideas. Each idea must be in a DIFFERENT industry
(no two from the same sector). Span at least 8 of: healthcare,
education, fintech, climate, logistics, agriculture, retail,
entertainment, dev tools, B2B services.

The model now has 10 explicit slots to fill.

Step 3: Provide a per-item template

List 10 marketing taglines. For each tagline:

[N]. **<tagline>** (max 8 words)
     — Angle: <pain point | aspiration | wit | technical>
     — Target: <persona>

Templated items force consistent depth across the list.

Step 4: Number the slots in advance

Fill in items 1 through 10 below. Do not skip numbers. Do not stop early.

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

This is surprisingly effective — the model treats it as a fill-in-the-blanks task instead of a free generation.

Step 5: Generate in batches if N is large

For N >= 20, split:

all_items = []
for batch_start in range(0, 50, 10):
    items = call_llm(f"""
Generate items {batch_start+1} through {batch_start+10} of a list of 50.
Already covered: {all_items}
Generate 10 NEW items not in the covered list.
""")
    all_items.extend(items)

Step 6: Validate count and retry the gap

items = parse_list(output)
if len(items) < N:
    extra = call_llm(f"You previously gave {len(items)} items. Give {N - len(items)} MORE distinct items, none from this list: {items}")
    items.extend(parse_list(extra))

Step 7: Raise max_tokens and remove aggressive stop sequences

Set max_tokens to roughly N * 80 for short items, N * 200 for detailed ones. Remove "\n\n" from stop sequences when generating lists.

When this is not on you

Some tasks legitimately don’t have N good answers. Forcing the model to fabricate items 7-10 produces worse output than honestly returning 6 and saying “more would be fluff.” Audit whether your downstream code can handle variable-length lists.

Easy to misdiagnose as

A max_tokens problem. Sometimes it is, but more often the model self-truncates well before the token cap because it ran out of distinct ideas. Check finish_reason and token count before bumping max_tokens.

Prevention

  • Ask only for N that’s plausible given the topic.
  • Always include a diversity axis (industry, persona, format, angle).
  • Use a per-item template so depth stays consistent.
  • Pre-number the slots for fixed-length lists.
  • Validate count and retry the gap, don’t trust first response.
  • For large N, batch and pass already-covered items in.

FAQ

  • Should I just ask for “as many as you can think of”? That makes count unpredictable. If downstream cares about N, ask for N and validate.
  • Does asking for “high quality” or “creative” help? Marginally. Diversity axis and per-item template help more.

Tags: #Prompt engineering #Troubleshooting #llm-output #list-output #structured-output