Will adding "only cite real sources" to the prompt fix this?

No. The model believes its fabricated citations are real, so an instruction to "only cite real ones" changes little. Instruction-only fixes have low effectiveness here; you need retrieval plus a code-side check.

Are some models worse than others?

Yes. Smaller open-weight models fabricate citations at much higher rates, and a 2026 cross-vendor benchmark spanned roughly 14% to over 90% depending on model and domain. But even top-tier models fabricate without retrieval, so picking a better model is not a substitute for grounding.

The DOI resolves, so the citation is real, right?

Not necessarily. A "chimera reference" pairs a real DOI with the wrong title, year, or author. Compare the resolved CrossRef metadata against what the model claimed before trusting it.

My RAG pipeline still hallucinates occasionally. Why?

RAG still fabricates in about 5-15% of cases, almost always when retrieval silently returns nothing or returns off-topic chunks. Add an explicit empty-result branch (Step 2) and the corpus ID check (Step 4); do not let the model run when retrieval is empty.

Can I let the model browse the web to get real citations?

Yes, but use a grounded feature that returns source metadata (OpenAI `web_search` sources, Gemini grounding, or a `fetch_url` tool) rather than free-text URLs, and still validate that each returned URL resolves.

Troubleshooting

Model Invented Fake Citations and URLs

The model produced a citation like Smith et al. 2019 and the paper does not exist, or a URL that 404s. Why citation hallucination happens and how to stop it for good.

Published: May 24, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You asked for a research summary with citations. The model gave a fluent answer ending with Smith et al. (2019). Effects of microbreaks on cognitive performance. Journal of Occupational Psychology, 42(3), 287-301. Looks real. You search the journal and there is no such paper. The DOI is fake. Or worse, the URL points to a real domain but a path that returns 404. This is one of the most damaging hallucination modes, because a wrong-but-plausible answer reads as correct until someone actually tries to verify it.

Fastest fix: if the model has no document store and no live search tool, do not ask it to cite. A bare model that you ask to “cite sources” will invent citation-shaped strings, full stop. Ground the answer in retrieved text (RAG) or a real search/fetch tool, constrain the output to those sources by ID, and reject any response whose URLs or DOIs do not resolve. The rest of this page is how to do each of those.

Citation hallucination is structural, not random. Models trained on text that constantly pairs claims with citations learn the pattern “a claim should be followed by a citation-shaped string.” When the model has no real citation, it produces a citation-shaped string anyway. You cannot prompt that instinct away. You have to remove the situation where the model is forced to invent.

This is not a fringe problem. A Lancet-reported study found fabricated citations in roughly 1 in 2,828 academic papers in 2023, rising to about 1 in 277 in early 2026 as AI-assisted writing spread. Across a 2026 benchmark of commercial models, citation hallucination rates ranged from about 14% to over 90% depending on model and domain, and adversarial prompts pushed some models past 90%.

Which bucket are you in

Run two checks before you touch the prompt. They tell you which fix applies.

Symptom	Likely cause	Go to
No retrieval / no tools, citations are mostly fake	Bare model inventing from training patterns	Step 1, Step 5
RAG layer returned `[]` but model still cited	Retrieval failed, model answered anyway	Step 2
Retrieved chunks have no citation, model added one	Model padding for rigor	Step 2, Step 4
URL is real domain, path 404s	Format-matched URL hallucination	Step 3
DOI looks valid, does not resolve	DOI fabrication or chimera reference	Step 3
Real author, wrong/nonexistent paper	Author-paper conflation	Step 4, Step 7

A chimera reference is worth calling out: the model pairs a real DOI or a real author with the wrong title, year, or venue. The pieces exist; the combination does not. These pass a naive “does the DOI resolve” check and only fail when you compare the resolved metadata against what the model claimed.

Common causes

1. Citations requested from a bare model with no retrieval

“Summarize the research on X and cite sources.” The model has no database. It generates plausible citations from training-distribution patterns. Most will be partially or fully fake.

How to spot it: Run any 5 of the model’s citations through Google Scholar or CrossRef. If 0-1 are real, this is the bug.

2. RAG retrieved nothing but the model answered anyway

Your retrieval step returned no documents (empty result, or every chunk below the similarity threshold). The model received no context but still answered confidently, and invented citations to back the confidence.

How to spot it: Log the retrieved chunks. If the RAG layer returned [] and the model still cited sources, the model fabricated them.

3. Retrieved chunks do not contain the citation but the model emits one

RAG fetched 3 paragraphs about microbreaks. None contains a citation. The model still ends with (Smith, 2019) to seem rigorous.

How to spot it: Search the retrieved chunks for the cited string. Not found means the model fabricated it.

4. Model conflates a real author with the wrong paper

Author John Smith is real and wrote on a different topic in 2017. The model attributes a 2019 paper on a similar-sounding topic to him. The author is real, the paper is not.

How to spot it: The author name returns Google Scholar hits, but not for the cited paper title.

5. URL hallucination from confident pattern-matching

Source: https://stackoverflow.com/questions/1234567/how-to-foo — the domain is real and the slug pattern is right, but the question does not exist. The model generated a URL that matches the format.

How to spot it: Open the URL. A 404 or “no results” means it is fabricated.

6. DOI fabrication

The model emits https://doi.org/10.1234/jop.2019.42.3.287. The format is valid, the registrant prefix may even be real, but the DOI does not resolve.

How to spot it: Paste the DOI into doi.org. “DOI not found” means it is fake. CrossRef indexes roughly 180M scholarly works as of 2026, so a real DOI almost always resolves there.

7. Cited author names follow generic patterns

The model defaults to common English surnames: Smith, Jones, Brown, Williams. Real research in a specialized or non-English field usually has a very different surname distribution.

How to spot it: A bibliography heavy on generic English surnames in a non-English-language field is suspicious.

Shortest path to fix

Step 1: Do not ask a bare model for citations

This is the single biggest fix. If the model has no document store and you ask it to cite, expect a very high fabrication rate.

BAD:  "Summarize research on X. Include 5 academic citations."
GOOD: "Summarize the topic of X based on your general knowledge.
       Do NOT include citations, DOIs, or URLs. Mark anything
       specific as 'based on general knowledge, please verify.'"

Step 2: Use RAG and constrain output to the retrieved sources

You will receive 3 document excerpts below.
Answer the user's question using ONLY these excerpts.
For every claim, cite which excerpt it came from: [1], [2], [3].
If the excerpts do not cover something, say "Not in provided sources."
Never invent a citation. Never reference a source not in the excerpts.

The negative constraint is critical. Grounding the answer in retrieved text cuts hallucination by 30-70% in published 2026 studies, and grounded summarization can drop below 2%, but RAG still fabricates in roughly 5-15% of cases, mostly when retrieval silently returns nothing or returns off-topic chunks. So you also need Step 2’s empty-result guard and Step 4’s ID check.

Add an explicit empty-retrieval branch in code, not in the prompt:

if not retrieved:
    return "No sources found for this query."  # never call the model

Step 3: Validate every URL and DOI programmatically

import re
import requests

def validate_citations(text):
    urls = re.findall(r'https?://[^\s)]+', text)
    dois = re.findall(r'10\.\d+/[^\s)]+', text)
    bad = []
    for url in urls:
        try:
            r = requests.head(url, timeout=5, allow_redirects=True)
            if r.status_code >= 400:
                bad.append(url)
        except Exception:
            bad.append(url)
    for doi in dois:
        r = requests.head(f"https://doi.org/{doi}", timeout=5, allow_redirects=True)
        if r.status_code >= 400:
            bad.append(doi)
    return bad

Reject any response that contains an unreachable citation. To also catch chimera references (a DOI that resolves but to the wrong paper), query the CrossRef REST API at https://api.crossref.org/works/{doi} and compare the returned title and author against what the model claimed. A resolving DOI is necessary but not sufficient.

Step 4: For RAG, validate citation tokens against the retrieved corpus

allowed_sources = set(chunk['id'] for chunk in retrieved)
cited = extract_citation_ids(model_output)
fake = cited - allowed_sources
if fake:
    raise ValueError(f"Model cited sources not in corpus: {fake}")

Step 5: Use a grounded-citations feature instead of free-text citations

As of June 2026, the major vendors ship features that bind citations to real source text so the model cannot point at something that was never provided.

Anthropic Citations API: set citations: {enabled: true} on each document content block. The response interleaves text blocks with citation objects that contain cited_text, document_index, and start_char_index/end_char_index (or page numbers for PDFs). These are guaranteed valid pointers into the documents you supplied, and cited_text does not count toward output tokens. Note: Citations is incompatible with Structured Outputs and returns a 400 if you enable both.
OpenAI Responses API with web search: enable the web_search tool and pass include: ["web_search_call.action.sources"]. The final answer carries inline url_citation annotations with the title and URL of each page actually retrieved.
Google Gemini grounding: enable Google Search grounding so Gemini 3.1 Pro returns grounding metadata (the supporting snippets and source URIs) for each grounded claim.

The principle is the same across all three: the model can only cite what a tool actually returned. No tool result, no citation.

tools = [{"name": "search_papers", ...}, {"name": "fetch_url", ...}]
# The model can ONLY cite what a tool returned.

Step 6: Make a fabricated citation cheaper to avoid than to invent

IMPORTANT: If you cannot find a real source, write
"No source found" instead of inventing one. A fabricated
citation is treated as a failure; "No source found" is not.

Stating the consequence reduces fabrication on aligned models, but never rely on it alone. Combine it with Step 2 and Step 4. The instruction nudges behavior; the code guarantees it.

Step 7: Surface citations as separate verifiable claims in the UI

Do not render citations as plain text. Each citation should be a clickable link that opens the source, and if the link 404s, show an error badge. For RAG, render the cited snippet next to the claim so a reader can confirm the source actually says it. Forcing verifiability in the UI is the last line of defense and the one that catches chimera references humans would otherwise trust.

How to confirm it is fixed

Take 20 of your most citation-heavy outputs and run validate_citations() over them. The reject rate on bad URLs/DOIs should be 0.
For RAG, confirm the ID check in Step 4 never raises on a held-out test set, meaning every cited ID is in the corpus.
Sample 10 outputs and resolve each DOI through CrossRef, comparing title and author. Zero chimera references.
Track fabrication rate per model and per prompt template over time. If it creeps up after a model or prompt change, you will see it before a user does.

When this is not on you

Bare LLMs are trained on corpora containing millions of citations and internalize “claims need citations” as a pattern. Even with strong instructions, top models still hallucinate a meaningful fraction of citations without retrieval, and smaller open-weight models fabricate far more. Retrieval is non-optional for any product that requires accurate sourcing. This is a property of how the models are trained, not a sign you wrote a bad prompt.

Easy to misdiagnose as

A “model knowledge gap,” as in “this model just does not know recent papers.” It does know there are papers; it does not have a way to ground a specific one. Knowledge is fine; verifiability is not. The fix is never a bigger or newer model on its own.

Prevention

Never ask a bare model for citations without retrieval.
Always constrain output to the retrieved corpus with explicit IDs.
Validate every URL and DOI in post-processing; reject bad responses; cross-check resolved DOIs against CrossRef metadata.
For paper-grade output, use a grounded-citations feature (Anthropic Citations API, OpenAI web search sources, or Gemini grounding).
Surface citations as clickable links so fakes get caught at click time.
Log fabrication rate per model and per prompt template and track it over time.

FAQ

Will adding “only cite real sources” to the prompt fix this? No. The model believes its fabricated citations are real, so an instruction to “only cite real ones” changes little. Instruction-only fixes have low effectiveness here; you need retrieval plus a code-side check.
Are some models worse than others? Yes. Smaller open-weight models fabricate citations at much higher rates, and a 2026 cross-vendor benchmark spanned roughly 14% to over 90% depending on model and domain. But even top-tier models fabricate without retrieval, so picking a better model is not a substitute for grounding.
The DOI resolves, so the citation is real, right? Not necessarily. A “chimera reference” pairs a real DOI with the wrong title, year, or author. Compare the resolved CrossRef metadata against what the model claimed before trusting it.
My RAG pipeline still hallucinates occasionally. Why? RAG still fabricates in about 5-15% of cases, almost always when retrieval silently returns nothing or returns off-topic chunks. Add an explicit empty-result branch (Step 2) and the corpus ID check (Step 4); do not let the model run when retrieval is empty.
Can I let the model browse the web to get real citations? Yes, but use a grounded feature that returns source metadata (OpenAI web_search sources, Gemini grounding, or a fetch_url tool) rather than free-text URLs, and still validate that each returned URL resolves.

Tags: #Prompt engineering #Troubleshooting #llm-output #Hallucination #Citations #rag