Should I just ask for "as many as you can think of"?

That makes the count unpredictable and tends to return fewer, safer items. If anything downstream depends on N, ask for N and validate the count.

Does asking for "high quality" or "creative" help?

Marginally. A concrete diversity axis and a per-item template move the needle far more than adjectives.

Can I force exactly N items at the API level?

Only on Gemini today (`minItems` = `maxItems` = N in `responseSchema`). OpenAI Structured Outputs ignores those keywords as of June 2026, and Claude has no `json_schema` format, so on those you validate-and-retry in code.

The output stops mid-sentence — is that the same bug?

No. Mid-sentence truncation with `finish_reason: length` (or `stop_reason: max_tokens`) is a hard token cap — raise `max_tokens`. This article is about the model stopping *cleanly* before N, which the cap won't fix.

Raising temperature seems to give more items — should I?

It can increase diversity and reduce early collapse, but it also raises the odds of off-topic or low-quality items. Prefer an explicit diversity axis and batching over cranking temperature.

Troubleshooting

Prompt Asks for 10 Items, Model Returns 3 and Stops

You asked for 10 ideas and got 3, or 10 slots padded with filler. Why list-length prompts under-deliver, and the prompt + schema fixes that actually get you N distinct items.

Published: May 24, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You asked for “10 marketing taglines for a noise-cancelling headphone.” The model wrote 3 good ones, then and here are some more ideas: catchy, focused, premium. and stopped. Or it filled all 10 lines but items 7-10 are paraphrases of items 1-3. Or it returned 10 great items and kept going into item 11 because it never decided where to stop. This is one of the most predictable LLM failure modes: a bare list-length request gives the model nothing to fill, so it runs out of distinct ideas partway through and either truncates honestly, pads, or overshoots.

Fastest fix: don’t ask for a count and hope. Give the model the slots to fill, then enforce the count in code. Concretely: (1) pre-number the list (1. through 10., “do not skip numbers”), (2) add one diversity axis (“each from a different industry”), and (3) after the call, parse the items and re-prompt for the gap if len(items) < N. If you are on the Gemini API you can also constrain the count at the schema level with minItems/maxItems (more on that below). The deeper fix is to stop treating “give me 10” as one generation and treat it as 10 separate generation steps, each with its own constraint.

Which bucket are you in?

Before changing the prompt, look at the actual output and match the symptom. The fix differs by cause.

Symptom in the output	Most likely cause	Go to
Output ends with “…and more” or trails off after a few items	Topic genuinely has fewer than N good answers	Cause 1
7 of 10 items are the same flavor (all SaaS, all marketplace)	No diversity axis given	Cause 2
Always stops near the same length no matter what N you ask for	Soft “this feels complete” length budget	Cause 3
Items 1-3 are detailed, 4-10 are one-liners	No per-item template	Cause 4
Later items take odd semantic leaps to “stay unique”	Model is self-deduplicating	Cause 5
Output cut off mid-item, `finish_reason`/`stop_reason` not `stop`/`end_turn`	Stop sequence fired or `max_tokens` hit	Cause 6
Works alone, under-delivers inside your real pipeline	Instruction buried in long context	Cause 7

Common causes

1. The space of valid answers is smaller than N

You asked for “10 unique benefits of drinking water.” There are maybe 5 truly distinct benefits. Items 6-10 will be rephrases or fluff. Not the model’s fault.

How to spot it: Before reading the model’s output, ask yourself “if I wrote this list myself, could I genuinely think of N distinct items?” If no, the prompt over-asks.

2. No diversity constraint

“List 10 startup ideas.” The model gravitates to the highest-probability ideas (AI for X, marketplace for Y) and runs out fast. This is the documented long-tail problem: models oversample common answers and undersample the long tail. Without “diverse across industries” or “each from a different sector,” items converge.

How to spot it: Look at the items. If 7 of the 10 are SaaS or marketplace, you didn’t constrain diversity.

3. Model hit a soft length budget

Even with max_tokens=4000, models develop their own “this answer feels complete” heuristic, often winding down around 400-600 tokens regardless of how much was asked for. The model stops naturally, not because it ran out of token budget.

How to spot it: Count tokens in the response. If it always stops around the same length regardless of N, and finish_reason is stop (not length), it is a soft budget, not a hard cap.

4. No item template — quality drift

“List 10 product ideas” with no structure: the model writes 3 detailed ideas, then becomes terse for items 4-10. Without a per-item template, depth varies.

How to spot it: Items 1-3 are 3 paragraphs each; items 4-10 are one sentence.

5. Items are too coupled — model self-deduplicates

After writing “1. Bluetooth connectivity,” the model is reluctant to write another wireless-themed item because it pattern-matches “I already covered wireless.” Without explicit permission to be similar, the list collapses.

How to spot it: Items 7-10 take wider semantic jumps than items 1-3 — the model is stretching to stay “unique.”

6. Stop sequence or token cap fired early

Your API call includes a stop sequence like "\n\n" or "###". The model emits one of these between items and the API truncates. Or you set max_tokens too low and the answer was cut off mid-item.

How to spot it: Output ends mid-list. Check the finish field in the API response:

OpenAI Chat Completions returns finish_reason. stop = a stop sequence fired or the model ended naturally; length = the max_tokens cap was hit.
Anthropic Messages API returns stop_reason. stop_sequence = one of your stop_sequences fired; max_tokens = the cap was hit; end_turn = the model finished naturally.

If you see length/max_tokens, raise the cap. If you see stop/stop_sequence but the list is incomplete, your stop sequence is the culprit — remove "\n\n" and "###".

7. List request inside a long context window

The model has 8000 tokens of context and your “give me 10 ideas” is the last 20. Models systematically under-attend to brief instructions buried in long context.

How to spot it: The same prompt in isolation works; in your real pipeline it under-delivers. Move the instruction to the end of the prompt and repeat the count requirement at both the top and bottom.

Shortest path to fix

Step 1: Validate the prompt asks for a plausible N

If you can’t easily list N yourself, ask for fewer. Cap at the realistic count.

Step 2: Add a diversity dimension

Don’t just say “10 different.” Specify the axis:

List 10 startup ideas. Each idea must be in a DIFFERENT industry
(no two from the same sector). Span at least 8 of: healthcare,
education, fintech, climate, logistics, agriculture, retail,
entertainment, dev tools, B2B services.

The model now has 10 explicit slots to fill.

Step 3: Provide a per-item template

List 10 marketing taglines. For each tagline:

[N]. **<tagline>** (max 8 words)
     — Angle: <pain point | aspiration | wit | technical>
     — Target: <persona>

Templated items force consistent depth across the list.

Step 4: Number the slots in advance

Fill in items 1 through 10 below. Do not skip numbers. Do not stop early.

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.

This is surprisingly effective: the model treats it as a fill-in-the-blanks task instead of free generation.

Step 5: Enforce the count at the schema level (Gemini) — or validate in code (OpenAI/Claude)

If you control the API call and want a hard guarantee, the cleanest lever is the response schema. As of June 2026 the support is uneven, so pick the path for your provider:

Gemini API (responseSchema) supports minItems and maxItems on arrays. Set both to N to force exactly N entries:

{
  "type": "array",
  "minItems": 10,
  "maxItems": 10,
  "items": { "type": "string" }
}

(Very large maxItems values have been reported to trigger 500 errors, so keep N reasonable and batch above ~50.)

OpenAI Structured Outputs (response_format: json_schema) does not support minItems/maxItems as of June 2026 — those keywords are silently ignored, so the schema cannot guarantee count. You still get a clean array shape, but you must validate the length yourself (Step 7).
Anthropic Claude has no native json_schema response format; use tool-calling with an input schema to shape the array, then validate the count in code.

Step 6: Generate in batches if N is large

For N >= 20, split:

all_items = []
for batch_start in range(0, 50, 10):
    items = call_llm(f"""
Generate items {batch_start+1} through {batch_start+10} of a list of 50.
Already covered: {all_items}
Generate 10 NEW items not in the covered list.
""")
    all_items.extend(items)

Step 7: Validate count and retry the gap

This is the universal fallback that works on every provider regardless of schema support:

items = parse_list(output)
if len(items) < N:
    extra = call_llm(
        f"You previously gave {len(items)} items. Give {N - len(items)} MORE "
        f"distinct items, none from this list: {items}"
    )
    items.extend(parse_list(extra))

Step 8: Raise `max_tokens` and remove aggressive stop sequences

Set max_tokens to roughly N * 80 for short items, N * 200 for detailed ones. Remove "\n\n" from your stop sequences when generating lists.

How to confirm it’s fixed

Run the prompt 3-5 times (lists are stochastic; one good run isn’t proof).
After each run, assert len(parse_list(output)) == N in code, not by eye.
Check the finish field: finish_reason should be stop / stop_reason should be end_turn, not length / max_tokens. A length/max_tokens finish means the model wanted to keep going — raise the cap.
De-dupe and eyeball the tail: if items N-2 through N are near-duplicates, your diversity axis is too narrow or N is genuinely too high.

When this is not on you

Some tasks legitimately don’t have N good answers. Forcing the model to fabricate items 7-10 produces worse output than honestly returning 6 and noting “more would be filler.” Audit whether your downstream code can accept a variable-length list before you hard-require N.

Easy to misdiagnose as a `max_tokens` problem

Sometimes it is. More often the model self-truncates well before the token cap because it ran out of distinct ideas — finish_reason comes back stop (not length). Check the finish field and token count before bumping max_tokens; otherwise you raise the cap and the list still stops at 3.

Prevention

Ask only for an N that’s plausible given the topic.
Always include a diversity axis (industry, persona, format, angle).
Use a per-item template so depth stays consistent.
Pre-number the slots for fixed-length lists.
Validate count in code and retry the gap; don’t trust the first response.
For large N, batch and pass already-covered items in.
Where the provider supports it (Gemini), pin minItems/maxItems.

FAQ

Should I just ask for “as many as you can think of”? That makes the count unpredictable and tends to return fewer, safer items. If anything downstream depends on N, ask for N and validate the count.
Does asking for “high quality” or “creative” help? Marginally. A concrete diversity axis and a per-item template move the needle far more than adjectives.
Can I force exactly N items at the API level? Only on Gemini today (minItems = maxItems = N in responseSchema). OpenAI Structured Outputs ignores those keywords as of June 2026, and Claude has no json_schema format, so on those you validate-and-retry in code.
The output stops mid-sentence — is that the same bug? No. Mid-sentence truncation with finish_reason: length (or stop_reason: max_tokens) is a hard token cap — raise max_tokens. This article is about the model stopping cleanly before N, which the cap won’t fix.
Raising temperature seems to give more items — should I? It can increase diversity and reduce early collapse, but it also raises the odds of off-topic or low-quality items. Prefer an explicit diversity axis and batching over cranking temperature.

Tags: #Prompt engineering #Troubleshooting #llm-output #list-output #structured-output