AI Hallucinated Facts — How to Detect and Reduce

The model confidently produced wrong facts, citations, or API calls.

You asked for a specific paper, a function signature, or last quarter’s revenue number, and the model returned a confident answer that turned out to be wrong. The citation does not resolve, the function name does not exist in the SDK, or the number is off by an order of magnitude. Worse, when you pushed back, the model doubled down with another fake citation. Hallucination is not random noise — it concentrates in the gap between “needs a fact” and “has no source”. This page walks through the prompt shapes that invite hallucination and how to close the gap with retrieval, source pinning, and explicit uncertainty.

Common causes

Ordered by frequency.

1. Asking for a fact without supplying or enabling retrieval

If the prompt is "What is the time complexity of Postgres GIN index lookups?" and the model has no web tool, no doc paste, and no grounding, the only option is to guess from training memory. The guess is often close but not exactly right. The model will not say “I do not have the doc” — it will produce a number that sounds plausible.

How to spot it: your prompt names a specific entity (paper, function, number, date) and you have not given the model any way to look it up.

2. Asking for a “complete” or “comprehensive” list

Prompts like "list all the React hooks" or "give me every Cloudflare Workers binding type" trigger completion behavior. The model writes 10 real items and 2 invented ones to make the list look exhaustive. Few real lists in training are exactly N items, so the model pads.

How to spot it: output has suspiciously round counts (exactly 10, exactly 7), or items at the end of the list look generic.

3. Training cutoff vs. recent change

You ask about a library version, API change, or pricing tier that shipped after the model’s cutoff. The model answers using the pre-cutoff state and presents it as current.

How to spot it: the topic is something that changed in the last 6-12 months — versions, prices, model names, deprecations.

4. No instruction to flag uncertainty

Default behavior in chat models is to produce a fluent answer, not to admit ignorance. Without "say I do not know if you are not sure", the model fills gaps with plausible-sounding strings.

How to spot it: the answer never contains “I am not sure”, “I cannot verify”, or “based on training data which may be outdated”.

5. Long context dilutes grounding

Even when you paste the source, if your prompt is 8k tokens long and the source is on page 1, the model may invent facts that contradict the source by the time it generates page 4. Long context attention is not uniform.

How to spot it: the source is in the prompt, but the answer contains claims not in the source.

6. Pushback triggers confabulation

When you say "are you sure?", the model often invents a stronger-sounding citation instead of backing down. RLHF rewards confident-sounding answers.

How to spot it: the second answer has a more specific (but still fake) citation than the first.

Before you change anything

  • Confirm whether the hallucination is reproducible or one-off; run the same prompt twice.
  • Write down the exact prompt, model, and any system prompt or grounding source.
  • Save the wrong output verbatim so you can compare against a corrected pass.
  • Note whether tool use (web search, code execution, file retrieval) was enabled.
  • Check whether the topic is something time-sensitive (versions, pricing, news).

Information to collect

  • The exact prompt text and any system prompt.
  • Model name, version, temperature, and tool-use settings.
  • The wrong fact verbatim, and the correct answer from a primary source.
  • Whether retrieval, web search, or file context was available.
  • The training cutoff of the model in use vs. the topic’s recency.

Shortest fix path

Ordered by ROI.

Step 1: Paste the source instead of relying on memory

Replace "What does the Stripe API return on a failed charge?" with the doc text plus a constrained instruction:

Source (Stripe docs, copied below):
<paste relevant section>

Question: What does Stripe return on a failed charge?
Rules:
- Use only the text above.
- If the answer is not in the text, write "NOT IN SOURCE".
- Quote the exact line you used.

The model now has nowhere to invent from.

Step 2: Enable retrieval/tool use when available

For ChatGPT, turn on web search. For Claude, attach the file or use a connector. For API calls, enable tool use with a search tool. Without retrieval, factual questions on niche or recent topics are a coin flip.

Step 3: Add an explicit uncertainty token

Append to every fact-heavy prompt:

For each non-trivial claim, append a confidence tag:
- [VERIFIED] if you can quote a source in this conversation.
- [LIKELY] if it matches your training but you cannot cite.
- [UNCERTAIN] if you are guessing.
Refuse to produce [VERIFIED] without a quoted source.

This converts silent hallucination into visible uncertainty.

Step 4: For code, demand runnable artifacts

Replace "how do I use the OpenAI Python SDK to stream a response" with:

Write a runnable Python script using openai==1.40.0.
Include the exact import line.
At the end, list every method, class, and parameter you used, and mark each as VERIFIED-FROM-DOCS or UNCERTAIN.

Then run it. If it crashes on AttributeError, the model invented a method.

Step 5: Cross-check every named entity

For every paper title, function name, URL, or number in the output, verify against a primary source: official docs, the paper page, the GitHub repo, the company changelog. Treat the model’s confidence as zero signal.

Step 6: Do not “argue” — restart with grounding

If the model produces a fake citation and you say "are you sure?", you will likely get a more elaborate fake. Instead, start a new turn with "I cannot find X. Paste the URL or section number where you read this. If you cannot, mark it UNCERTAIN and stop."

How to confirm the fix

  • Every non-trivial claim in the output has a quotable source or an UNCERTAIN tag.
  • Running the code does not throw AttributeError, ImportError, or 404s.
  • A second model or a search engine corroborates the facts.
  • Re-running the same prompt twice produces the same factual claims.

If still broken

  • Switch to a model with retrieval/grounding (Perplexity, ChatGPT with web on, Claude with web search).
  • Reduce the question scope — break “tell me everything about X” into 5 small factual questions.
  • For domain-specific facts (legal, medical, financial), do not use a general chat model; use a domain-tuned tool with citations.
  • If the topic post-dates the cutoff, accept that any unverified answer is a guess and require external lookup.

Prevention

  • Keep a default suffix: "Tag each non-trivial claim [VERIFIED] / [LIKELY] / [UNCERTAIN]."
  • For code prompts, always pin the library and version in the prompt.
  • Use grounded models for fact-heavy work, chat models for reasoning/writing.
  • Never accept a model’s pushback citation without independent verification.
  • Build a habit of running the code, resolving the URL, and opening the cited page before trusting any output.

Tags: #Troubleshooting #Prompt #Prompt quality #Hallucination