Missing Examples Cause Output Drift (Few-Shot Fix)

Q: Do I need XML tags, or are code fences enough?

For Claude, ` `/` ` tags are the documented best practice and parse more reliably than fences. For GPT-5.5 and most other models, `###` or `"""` delimiters with labeled input/output work fine. The non-negotiable part is that every example is clearly fenced off from your instructions.

Describing a tone or shape in words makes the model approximate; pasting one concrete example makes it match. How to pick, place, and structure 1-5 examples to lock the output you want.

Published: May 20, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You described the output exactly: tone, length, structure, vocabulary. The model returned something that technically matched your description and structurally looked nothing like what you had in mind. You re-prompted “more like a quick technical note, less like a marketing post” — better, but still not it. After three rounds you finally pasted one real “quick technical note” and the next output was nearly perfect.

That is the description-versus-example asymmetry: language models copy shape from examples far more reliably than they construct shape from adjectives. One good example is worth several paragraphs of rules.

Fastest fix: stop adding adjectives. Paste one labeled example of an acceptable output (Like this: followed by the real text), and if the shape still drifts, add one labeled negative example with a one-line reason. For two or three examples, wrap each in its own delimiter so the model can tell them apart from your instructions. This page covers when descriptions alone fail, how many examples to use, and where to put them.

When examples are the wrong tool

One caveat first, because it changed in 2026. Examples are the most reliable lever for format, tone, and structure — the exact problem on this page. They are not always the right lever for reasoning accuracy. On pure reasoning tasks (math, multi-step logic, hard debugging), stacking few-shot examples in front of a thinking model such as GPT-5.5 Thinking/Pro or Claude Opus 4.7 with extended thinking can add noise and lower accuracy. Those models reason internally; they do not need a worked example to imitate. As of June 2026 both OpenAI’s and Anthropic’s guidance say: for reasoning, describe the goal and the output format cleanly and skip the exemplars.

Rule of thumb: use examples to control what the output looks like, use a clean instruction to control how the model thinks.

Common causes

1. Description relies on taste adjectives

“Quick”, “punchy”, “warm”, “technical” — each resolves to the model’s average reading of that label, which is rarely your reading.

How to spot it: your description has 3 or more adjectives and 0 examples.

2. Shape constraints are described, not shown

“Use 3 bullets, each with a header and a one-sentence explanation.” More specific than adjectives, but shape is still underspecified — what kind of header, where the colon goes, how long the sentence runs.

How to spot it: your structural rules are prose, not a literal template the model can copy.

3. Examples came from the wrong domain

You anchored a casual internal Slack message with a polished press-release example. The model copied the press-release register and produced something far too formal.

How to spot it: the example you pasted does not match the target domain in tone or genre.

4. Examples contradict each other

Example 1 is short. Example 2 is long. Example 3 has bullets. The model averages them and gets confused — worse, it may copy the wrong one for your input.

How to spot it: your 2-3 examples differ in length, tone, or structure in ways that are not driven by the input.

5. No negative example

A negative example (“not like this”) is often as anchoring as a positive one. Without it the model can drift in the wrong direction with no signal that it is off.

How to spot it: you have only positive examples and no rejected example.

6. Examples are not delimited

You pasted the example as plain prose inline with your instructions, so the model could not tell where the example stopped and the task resumed. It treated half your example as a rule, or half your rule as part of the example.

How to spot it: there is no fence, tag, or marker separating the example text from the surrounding instruction.

Which bucket are you in

Symptom	Likely cause	Go to
Output is generic / “average” of the label	Adjective-only description	Step 1
Drifts toward marketing or fluff	No negative example	Step 2
Tone is wrong (too formal / too casual)	Wrong-domain example	Step 3
Inconsistent run to run	Contradicting examples	Step 4 + Step 6
Model treats your example as an instruction	Examples not delimited	Step 6
Right shape but wrong logic / facts	Reasoning task, not a shape task	See “When examples are the wrong tool” above

Before you change anything

Find one real example of an acceptable output — pull one from your archive, or hand-write one.
Find one example of an unacceptable output and the specific reason it failed.
Note the tone, length, structure, and vocabulary of the acceptable example.
Confirm the example matches the target domain.
Decide where in the prompt the examples go (just before the deliverable usually wins).

Information to collect

The current prompt with all of its description.
The output that drifted.
One sample of what you wanted, byte-exact.
One sample of what to avoid, with the reason.
The model and any system prompt in use.

Shortest path to fix

Step 1: Add one positive example

Like this:
\`\`\`
Hey — the env var didn't load because Vercel scopes secrets per environment.
Move `STRIPE_KEY` from "Development" to "Production" in Project Settings > Environment Variables.
Redeploy. That should fix it.
\`\`\`

One labeled example shapes the output more than five sentences of description.

Step 2: Add one negative example with a reason

Not like this:
\`\`\`
In modern software development, environment variables play a crucial role in deployments.
Let me walk you through the process step by step...
\`\`\`
Reason: too marketing-y, opens with filler, takes too long to reach the fix.

The “reason” line is what makes the contrast actionable — without it the model only learns “avoid these exact words”, not the underlying flaw.

Step 3: Keep examples in the target domain

If the output is an internal Slack message, use a Slack-style example. If it is a PR description, use a PR description. Genre transfer is fragile: a press-release example will pull a Slack message toward press-release register every time.

Step 4: Use 1-5 examples, matched to input variety

For a single fixed task, one strong example is usually enough. If the task runs against many different inputs, show a few that span typical cases. Vendor guidance as of June 2026 converges on a small number: OpenAI suggests 1-5 input/output pairs and to add them only if zero-shot is not already working; Anthropic recommends 3-5 examples for best results and to make them diverse so the model does not latch onto an unintended pattern. More than five rarely helps and starts to cost latency and tokens.

Examples (vary by input):

Input: "Vercel deploy failed"
Output: "Check the Build Command in vercel.json. The most common cause is..."

Input: "Firebase auth not working"
Output: "Open Firebase Console > Authentication > Settings. Authorized domains must include..."

Now produce output for:
Input: "<the actual user input>"

Step 5: Place examples after the constraints, before the deliverable

[Top]
TASK + CONSTRAINTS

[Middle]
EXAMPLES (1-5)

[Bottom]
NOW PRODUCE OUTPUT FOR <input>

Examples just before the deliverable are the most salient. One exception: if you are also pasting a large reference document (20k+ tokens), put that document at the very top and keep your task and question at the end — Anthropic reports queries-at-the-end can improve answer quality by up to 30% on long, multi-document inputs.

Step 6: Delimit every example explicitly

The model has to know where each example begins and ends, or it will blur examples into instructions. Use a consistent marker:

Claude: wrap each example in <example> tags and group them inside <examples>. Anthropic’s docs say Claude distinguishes examples from instructions more reliably this way than with plain line breaks.
GPT-5.5 / general: mark boundaries with delimiters such as ### or triple quotes """, and label the input and output of each pair.

<examples>
  <example>
    Input: "Vercel deploy failed"
    Output: "Check the Build Command in vercel.json..."
  </example>
  <example>
    Input: "Firebase auth not working"
    Output: "Open Firebase Console > Authentication > Settings..."
  </example>
</examples>

Step 7: Lock examples to a versioned file

For high-volume production prompts, store the examples in a versioned file and rebuild the prompt from that file. Update them as your standard evolves. This stops “example drift” — a stray edit to an inline example silently changing every output in a long-running workflow.

How to confirm the fix

A new output shape-matches your positive example in length, structure, and register.
A new output does not resemble your negative example.
Running the same prompt three times produces three outputs in the same shape (run them in separate chats or with no shared memory so one run does not contaminate the next).
A teammate can pick the “good” one without you explaining the rules.

If it still fails

Too few examples — add a third or fourth (stop at five).
Examples contradict each other — audit them for consistency in length, tone, and structure.
The model needs explicit pattern extraction: add “Note the structure of the examples: opening line, fix line, expected-result line. Match this structure exactly.”
For mechanically strict shape, switch to schema-enforced output. Both OpenAI (Structured Outputs / JSON schema) and Anthropic (Structured Outputs) can constrain responses to a schema, which beats examples when the shape must be exact every time.
If logic, not shape, is wrong, you are in the reasoning bucket — remove the examples and instead describe the goal and output format cleanly (see the caveat near the top).

Prevention

Default: any prompt with a stylistic requirement carries at least one example.
Build reusable example libraries for recurring task types.
Watch for cross-domain bleed: do not anchor a casual email with a legal-contract example.
Audit example libraries quarterly and remove examples that no longer match the current style.
For team workflows, agree on canonical examples — one team using template A and another using template B is a guaranteed source of drift.
When in doubt, write one example. The five minutes it takes saves an hour of re-prompting.

FAQ

How many examples should I use? For a fixed task, one good example usually does it. For a task that runs against varied inputs, use a few that span the range. As of June 2026, OpenAI suggests 1-5 input/output pairs and Anthropic suggests 3-5 for best results. Past five you mostly add cost and latency, and risk the model overfitting to a surface pattern.

Why did my output get worse after I added examples? Two common reasons. Either the examples contradict each other and the model is averaging them, or you are on a reasoning task with a thinking model (GPT-5.5 Thinking/Pro, Opus 4.7) where exemplars add noise. For reasoning, drop the examples and describe the goal plainly.

Where exactly should the examples go in the prompt? After the task and constraints, immediately before the line that asks for the deliverable. The exception is a large reference document, which goes at the very top with your question at the end.

Positive or negative example — which matters more? Start with one positive example; it sets the target shape. Add a negative example with a one-line reason when the model keeps drifting toward a specific failure mode (marketing fluff, over-formality). The reason is what makes the negative example teach a rule instead of a blocklist of words.

Do I need XML tags, or are code fences enough? For Claude, <example>/<examples> tags are the documented best practice and parse more reliably than fences. For GPT-5.5 and most other models, ### or """ delimiters with labeled input/output work fine. The non-negotiable part is that every example is clearly fenced off from your instructions.

My example keeps showing up verbatim in the output. Why? The model is treating the example as part of the answer because it is not delimited. Wrap it in <example> tags or """ and add a line like “The examples above are for format reference only; do not repeat them.”

Tags: #Troubleshooting #Prompt #Prompt quality #Prompt engineering