Few-Shot Examples Have Uneven Quality and Drag Output Down
You provided 5 few-shot examples. Two are great, three are mediocre. The model averages toward the mediocre ones. Why example quality variance hurts and how to curate.
Articles tagged with #llm-output
You provided 5 few-shot examples. Two are great, three are mediocre. The model averages toward the mediocre ones. Why example quality variance hurts and how to curate.
You asked for JSON matching a schema. Most calls return valid JSON, some return prose with JSON inside, some omit fields. Description vs enforcement, and how to fix at the API layer.
The model produced citations like Smith et al. 2019, journal of XYZ — and the paper does not exist. Or it linked to a URL that 404s. Why citation hallucination happens and how to stop it.
You prompted in English and the model answered in Chinese, or vice versa. Or it switched mid-output. Why language drift happens and how to lock the output language.
The model's response ends abruptly in the middle of a sentence, a JSON object, or a code block. Almost always max_tokens. How to size it, detect truncation, and recover.
You asked for 10 ideas, the model gave 3 and trailed off. Or it filled 10 slots but the last 4 are filler. Why list-length requests under-deliver and how to actually get N items.
Your prompt template still says 2023 in 2026. Model anchors to 2023 context — old API versions, old pricing, old facts. Why date-staleness compounds and how to keep prompts evergreen.