What this tutorial solves
Drop a file in, ask “summarize”, and you get the same generic safety-net output any LLM produces. Claude is dramatically better than that when fed correctly — but you have to know the moves. This guide is for people uploading legal contracts, research papers, financial reports, long transcripts, or multi-file comparisons, where the difference between a useful answer and a dangerous one is whether Claude actually grounded in the source.
Who this is for
Anyone uploading PDFs, contracts, research papers, transcripts, code files, or large datasets into Claude and expecting answers they can act on. Especially: legal, finance, research, M&A diligence, journalism.
When to reach for it
Long files (50+ pages), multi-file comparisons, files where you need to cite specific passages, files you will reference repeatedly, or any case where a hallucinated answer has real cost. Also when ChatGPT’s file handling has frustrated you — Claude is genuinely stronger here, but only if you upload well.
When this is NOT the right tool
Files with sensitive PII or non-disclosable info; files larger than upload limits; quick one-shot questions a Google search would answer; binary formats with no text layer (raw images, encrypted PDFs).
Before you start
- Rename files for clarity: “contract-vendor-a-2026.pdf” not “Final_v3 (1).pdf”. Claude quotes filenames in answers; readability matters.
- Strip non-essential pages if you can. A 600-page report with a 580-page appendix grounds worse than the 20 pages you actually need.
- Decide on a citation format up front: page numbers, section headers, or both. State it in your first prompt so Claude follows consistently.
- Have an independent copy open in another window to spot-check claims. Trust nothing on the first pass.
Step by step
- Before uploading: rename the file something Claude can reference (“contract-vendor-a-2026.pdf”, not “Final_v3 (1).pdf”).
- For multi-file work, upload all files in one message so Claude knows they are a set and you can ask comparative questions immediately.
- First message after upload:
Describe what each file is, its sections, and rough size. Do not summarize content yet.This forces Claude to inventory before interpreting. - For long PDFs, ask Claude to give the table of contents first. Drill into one section at a time. Broad questions across the full document silently degrade in the middle.
- Ask for direct quotes with page numbers for every claim. Phrase:
Quote the relevant passage and the page or section header.Refuse answers that paraphrase without citing. - For tables and spreadsheets, ask Claude to output structured CSV / Markdown — easier to verify than prose. Recompute one row by hand on important numbers.
- When you need to keep working with the file, put it in a Project so you do not re-upload each session. See Claude Projects.
First-run exercise
- Pick a 30-50 page document you have read before. The known content lets you spot hallucinations.
- Run steps 3-5 in order on a fresh chat. Save the answers.
- Count: of the claims Claude made, how many cited a page that actually contained that claim?
- If less than 90%, refine the prompt with stricter citation requirements and rerun.
Quality check
- Every numeric claim or quote has a page / section reference. No exceptions on documents that matter.
- You have manually verified at least 3 references against the original. Pick the most surprising claims.
- You know what the file does NOT contain (no hallucinated sections). Ask
What sections are NOT in this file?as a sanity check. - For multi-file work, every answer names which file it came from. If Claude blends sources, push back with explicit filename anchors.
How to reuse this workflow
- Save the prompt set as a template (“Standard contract review questions”, “Standard research-paper extraction”).
- For recurring file types, build a Project with custom instructions like
Always cite page numbers; refuse to answer without sourceand a saved structure. - Quarter the workflow: every 3 months, re-test with a fresh file to check Claude’s behavior hasn’t drifted.
- Keep a list of past hallucinations — failure modes recur, naming them helps you spot them faster.
Recommended workflow
A 200-page report: upload -> ask for TOC and section sizes -> drill into Section 4 -> ask for every numeric claim with page reference -> export as a structured Markdown table -> verify 3 numbers by hand -> save the prompt set for next time.
FAQ
- What is the max file size?: Varies by plan and file type. Past the limit, split the file or extract relevant sections first.
- Do uploaded files train Claude?: No — by default, Anthropic does not train on consumer chat data. Check your plan terms.
- Can Claude handle Excel with formulas?: It reads values, not formulas. If you need formula auditing, export to CSV and ask explicitly.
- What about scanned PDFs with bad OCR?: Quality drops on dense scans. Ask Claude to output the suspect page as raw text and spot-check.
- Do images inside PDFs get understood?: Mostly yes for diagrams; figure captions are usually read; complex multi-axis charts can be misread.
- How do I compare two contracts cleanly?: Upload both at once, give each a role in the prompt (“File 1 = current contract, File 2 = proposed redline; list every changed clause with quotes from each”).
Common mistakes
- Asking for “summary” before knowing the file structure. You get safe-sounding nothing.
- Mixing 10 unrelated files in one upload. Claude blends sources together in answers.
- Trusting unsourced claims. Always re-ask for the exact passage and page.
- Treating Claude’s OCR as perfect. Heavily scanned PDFs may have garbled passages — sanity-check.
- Loading the full document once and asking one broad question — middle-of-document recall degrades silently.
- Forgetting to clean up files in Projects — old versions blend with current ones and confuse answers.
Advanced tips
- For multi-file comparison, give the files clear roles in your prompt: “File 1 is the contract; File 2 is the proposal; find clauses that contradict.”
- For code files, paste them in code blocks rather than uploading — Claude treats inline code more directly.
- When pages get garbled, ask Claude to output the suspected page as raw text so you can spot OCR errors.
- For audit-grade work, request a JSON structure:
Output findings as JSON with fields: claim, file, page, quote, confidence_low|med|high. Easier to script verification.
Output checklist
- Every numeric claim or quote has a page / section reference.
- You have manually verified at least 3 references against the original.
- You know what the file does NOT contain (no hallucinated sections).
- The prompt set is saved for next time.