You spent 45 minutes writing the perfect prompt. It is 1,400 words. It covers tone, audience, constraints, edge cases, what to avoid, three reference snippets, and a checklist. The output is 200 words of sludge. Earlier the same day, a 60-word version of the same request produced a sharp answer. Adding more does not help: somewhere past about 500 words, every extra sentence dilutes priority instead of clarifying it. Long prompts make the model worse not because models cannot read long inputs, but because long inputs hide which sentences matter most. Without a clear hierarchy, the model averages, and averages are mediocre.
This page walks through the structural problems that turn long prompts into bad outputs and the shape changes that fix them without losing necessary detail.
Common causes
1. Goal buried in the middle
The first imperative sentence is what frames the response. If yours is in paragraph 4, the model has already locked into the wrong frame by the time it reads it.
How to spot it: search your prompt for the deliverable verb (write, produce, return, decide). If it first appears after line 5, it is buried.
2. Hidden constraint conflicts
“Be comprehensive AND concise.” “Cover all cases AND be under 200 words.” Long prompts accumulate these without you noticing. The model averages, which satisfies neither.
How to spot it: list every constraint on one sheet, look for adjective pairs pulling against each other.
3. Background is 80% of the prompt
If 1,100 of your 1,400 words are background and only 300 are task / constraints / output spec, the model interprets the prompt as “engage with this background” rather than “produce X”.
How to spot it: count words per section. If background outweighs task by 3:1 or worse, you have buried the ask.
4. No output format
Long prompt, no schema. The model defaults to a 5-paragraph essay because that is what training-distribution “thoughtful long answer” looks like. Even if you specified format in passing, without a schema block it does not land.
How to spot it: your output keeps coming back as essay prose when you wanted JSON / table / bullets.
5. Repetitive emphasis collapses
If you wrote “really important” 5 times, none of them feel important. The model parses repetition as “this is the genre” not “pay extra attention here”.
How to spot it: count how many times you wrote “important”, “critical”, “must”, “really”. If over 5, the emphasis has flattened.
6. Prompt is one wall of prose
No headers, no labels, no whitespace. The model has to infer structure. Inference is unreliable on long inputs.
How to spot it: no ## Background, ## Constraints, ## Output labels.
Before you change anything
- Save your current prompt and the current bad output side by side.
- Try the 60-word version that worked earlier — does it still work? (Isolates prompt-shape vs model issue.)
- Count words per section: task vs background vs constraints vs output spec.
- Identify the actual deliverable in one sentence (without rereading the prompt).
- Decide which paragraphs of background would not change the answer if removed.
Information to collect
- Full prompt with word count per section.
- The output you got vs the output you wanted.
- Model and any system prompt or project instruction in effect.
- A history of when this prompt last worked — was the model changed?
- For repeated prompts, the variance: does it ever produce a good answer or never?
Shortest path to fix
Step 1: Lift the goal to line 1
TASK: Decide whether to migrate from Postgres to DynamoDB
for the workload below. Pick one. Defend in 3 sentences.
[context follows]
The first imperative wins. Make sure yours is correct.
Step 2: Section the body
## Task
<one sentence>
## Context
<bulleted, only the load-bearing facts>
## Constraints
- <each one a single line>
- <if any conflict, say which wins>
## Output format
{
"decision": "postgres | dynamodb",
"reason": "<max 60 words>"
}
Labels dramatically improve parsing on long inputs.
Step 3: Cut redundant constraints
Read each constraint and ask: “would a reviewer actually check this?” If no, cut it. Soft preferences fight hard rules; cutting them strengthens the hard ones.
Step 4: Add a positive example, remove three sentences of rules
One sample of “correct output” is worth ~300 words of rules. If you have rules describing how output should look, replace them with one example showing exactly that.
Step 5: Put the output schema last
Last block in the prompt = highest recency weight. Use this for the structural spec:
[everything else]
OUTPUT (return only this):
\`\`\`json
{ "decision": "...", "reason": "..." }
\`\`\`
Step 6: Move large reference text into a sidecar
If you have 800 words of reference material, fence it in <reference> tags so the model treats it as input data, not as instructions to engage with:
<reference>
... 800 words of policy ...
</reference>
TASK: <one sentence above the reference is even better>
How to confirm the fix
- A stranger reads only the first 3 lines and correctly identifies the deliverable.
- Word count of “important / critical / must” is under 3.
- Background section is no more than 2x the task + constraints sections.
- Output matches the schema you specified, not generic essay shape.
- Running the same prompt 3 times produces 3 outputs of the same shape.
If it still fails
- Compress further — the goal is “smallest prompt that still produces the right answer”, not “biggest prompt with all detail”.
- Split into multiple turns: turn 1 plans, turn 2 executes, turn 3 verifies.
- Switch from chat UI to API with structured output (JSON schema enforcement, tool use).
- Try a model with stronger long-context attention if the prompt genuinely cannot be shorter.
Prevention
- Keep prompts under 600 words unless reference text is necessary.
- Default template: TASK first, CONTEXT second, OUTPUT FORMAT last.
- Use section headers when prompt exceeds 200 words.
- For repeated workflows, save a template; do not improvise structure each time.
- Audit prompts quarterly — long-lived prompts accumulate constraints that no longer matter.
- Re-read the first 3 lines before sending; a stranger should know what to produce.