You described the output exactly: tone, length, structure, vocabulary. The model produced something technically matching your description and structurally nothing like what you had in mind. You re-prompted “more like a quick technical note, less like a marketing post” — better, but still not it. After three rounds of refinement you finally pasted one example of a real “quick technical note” and the next output was nearly perfect. This is a description-vs-example asymmetry: language models follow shape from examples far more reliably than they follow shape from adjectives. One good example is worth several paragraphs of rules.
This page walks through when descriptions alone fail and how to use 1-3 examples to lock the shape you actually want.
Common causes
1. Description relies on taste adjectives
“Quick”, “punchy”, “warm”, “technical” — each resolves to the model’s average reading of that label, which is rarely your reading.
How to spot it: your description has 3+ adjectives and 0 examples.
2. Shape constraints are described, not shown
“Use 3 bullets, each with a header and a 1-sentence explanation.” This is more specific than adjectives but still leaves shape underspecified — what kind of header, where the colon goes, how long the sentence.
How to spot it: structural rules are prose, not a template.
3. Examples were given but from the wrong domain
You anchored a casual internal Slack message with a polished press release example. The model copied the press release register and produced something far too formal.
How to spot it: the example you used does not match the target domain in tone or genre.
4. Examples contradict each other
Example 1 is short. Example 2 is long. Example 3 has bullets. The model averages and gets confused. Worse, it copies the wrong example for your input.
How to spot it: your 2-3 examples differ in length, tone, or structure.
5. No negative examples
A negative example (“not like this”) is often as anchoring as a positive one. Without it, the model may drift in the wrong direction without knowing it.
How to spot it: only positive examples, no rejected examples.
Before you change anything
- Identify one real example of an acceptable output (find one in your archive, or hand-write one).
- Identify one example of an unacceptable output, and the specific reason it failed.
- Note tone, length, structure, and vocabulary of the acceptable example.
- Confirm your examples match the target domain.
- Decide where in the prompt the examples will go (before output spec usually works best).
Information to collect
- Current prompt with all description.
- The output that drifted.
- One sample of what you wanted, byte-exact.
- One sample of what to avoid, with the reason.
- Model and any system prompt.
Shortest path to fix
Step 1: Add one positive example
Like this:
\`\`\`
Hey — the env var didn't load because Vercel scopes secrets per-env.
Move `STRIPE_KEY` from "Development" to "Production" in dashboard settings.
Redeploy. That should fix it.
\`\`\`
One labeled example shapes the output more than 5 sentences of description.
Step 2: Add one negative example with a reason
Not like this:
\`\`\`
In today's fast-paced development environment, environment variables
play a crucial role in deployments. Let me walk you through...
\`\`\`
Reason: too marketing-y, opens with filler, does not get to the fix.
The “reason” makes the contrast actionable.
Step 3: Keep examples in the target domain
If output is an internal Slack message, use a Slack-style example. If output is a PR description, use a PR description. Genre transfer is fragile.
Step 4: Use 2-3 examples for variable inputs
If the task will be applied to many different inputs, show 2-3 examples spanning typical cases:
Examples (vary by input):
Input: "Vercel deploy failed"
Output: "Check your build command in vercel.json. Most common cause is..."
Input: "Firebase auth not working"
Output: "Open Firebase Console > Auth > Settings. Authorized domains needs to include..."
Now produce output for:
Input: "<the actual user input>"
Step 5: Place examples after constraints, before deliverable
[Top]
TASK + CONSTRAINTS
[Middle]
EXAMPLES (1-3)
[Bottom]
NOW PRODUCE OUTPUT FOR <input>
Examples just before the deliverable are most salient.
Step 6: Lock examples to canonical files
For high-volume production prompts, store examples in a versioned file. Update them as your standard evolves; rebuild the prompt from the file. This prevents example drift from breaking long-running workflows.
How to confirm the fix
- A new output shape-matches your positive example (length, structure, register).
- A new output does not resemble your negative example.
- Running the same prompt 3 times produces 3 outputs in the same shape.
- A teammate can pick which is the “good” one without prompting.
If it still fails
- Examples might be too few — add a third or fourth.
- Examples might contradict each other — audit them for consistency.
- The model may need explicit pattern extraction: “Note the structure of the examples: opening line, fix line, expected result line. Match this structure.”
- For complex shape, switch to JSON schema output where shape is mechanically enforced.
Prevention
- Default: any prompt with a stylistic requirement includes ≥1 example.
- Save reusable example libraries for recurring task types.
- Watch for cross-domain bleed: do not anchor a casual email with a legal contract example.
- Audit example libraries quarterly; remove examples that no longer match current style.
- For team workflows, agree on canonical examples — one team using PR template A and another using template B causes drift.
- When in doubt, write one example. The 5 minutes spent saves an hour of re-prompting.
Related reading
- Too many examples overwhelm the task
- Role instruction alone is not enough
- Ambiguous evaluation criteria
- No output format specified
- AI output style drift
Tags: #Troubleshooting #Prompt #Prompt quality #Prompt engineering