Long Document Workflow with Claude

Claude's long context shines here — process 100+ page docs without splitting.

What this covers

A repeatable workflow for processing long documents (100-500 pages) in Claude without splitting, mis-paraphrasing, or hallucinating sections that don’t exist. The pain: most people upload a 200-page PDF, ask “summarize in 5 bullets”, get a competent-looking output that misses the most important section, and ship the summary into a decision. Claude’s long context can outperform every alternative on documents like these, but only with structured prompting; this guide gives you the structure.

Key tools and concepts:

  • Claude: Anthropic’s conversational AI with 200k-token context on most plans.
  • Structural outline pass: Asking for the document’s structure before any content interpretation; the single most important step for grounded answers.
  • Per-section drill-down: Asking about one section at a time with page-cited claims, instead of one broad question across the whole file.

Who this is for

Researchers, analysts, lawyers, students, and anyone who reads long PDFs for a living. Especially valuable if you have ever been burned by an LLM “summary” that left out the part you needed most.

When to reach for it

Reading reports, contracts, papers, depositions, technical specs, regulatory filings, or anything past 50 pages. Also useful for multi-document comparisons when each document is itself long.

Before you start

  • Confirm the document has a real text layer (try selecting text in a PDF reader). Heavy scans rely on OCR, which adds noise.
  • Have an external copy open in another window to spot-check claims. Trust nothing on the first pass.
  • Decide your output target — an executive summary, a section-by-section table, an extracted quote list, a compliance gap analysis. Different targets need different prompting.
  • Estimate how long the workflow will take. Long-doc work runs 30-60 minutes; carve out the time, don’t squeeze it.

Step by step

  1. Upload the document. Rename it to something memorable (“acme-vendor-msa-2026.pdf”) so Claude can reference it by name in later messages.
  2. Ask for a structural outline first: List every top-level section and subsection of this document with the page range each covers. Do not summarize content yet. This forces inventory before interpretation.
  3. Dive into one section at a time, requesting page references: In Section 4 (pages 23-41), list every numbered obligation with the exact phrasing and page. Repeat for the sections that matter to your goal.
  4. For risk-sensitive content, ask Claude what is NOT in the document. What sections does this document NOT contain that you might expect for an MSA? Be specific. This catches absent terms you might assume were present.
  5. Verify by hand at least 3 quoted passages and 2 numbers against the source. If any are wrong, push back on Claude before continuing — accuracy degrades after the first hallucination.
  6. Ask for an executive summary at the end, only after section-level details are confirmed: Now write a 1-page executive summary citing the sections and pages of every claim.

First-run exercise

  1. Pick a 50-100 page document you have read before. Familiarity is your hallucination detector.
  2. Run steps 2-6 in order. Time the run.
  3. For each claim Claude makes, mark: cited correctly / cited wrong section / invented.
  4. Tighten the prompt with whatever made the most errors and rerun. Save the tighter prompt as your default for this document type.

Quality check

  • Every claim has a page or section reference. No exceptions on documents that matter.
  • You have verified at least 3 references by hand. Pick the most consequential claims.
  • Claude can name what is NOT in the document (sanity check for hallucinated coverage).
  • The executive summary’s claims trace back to specific sections, not generic phrases like “the document discusses”.
  • For numeric claims, recompute one totals row by hand. LLMs are most confidently wrong on math.

How to reuse this workflow

  • Save the structured prompt sequence as a template (“Long-PDF intake v3”).
  • For recurring document types (contracts, papers, filings), customize the template with type-specific questions.
  • Build a “common omissions” list — things you wish you had asked. Add to it every session.
  • Re-test the workflow with a known-good document each quarter; Claude’s behavior evolves.

Upload -> outline pass -> section dives with page-cited quotes -> negative space check (“what’s missing”) -> verify 3-5 quotes by hand -> executive summary with citations. The whole loop takes 30-60 minutes on a 200-page document, and you finish with a summary you can defend.

FAQ

  • How long is “long” for Claude?: Most plans support 200k tokens (roughly 500-800 pages of typical prose). Past that, split intelligently by section, not by page count.
  • Does long context degrade in the middle?: Yes — middle-of-document recall drops noticeably (Claude long-context unstable). Per-section drill-down mitigates this.
  • Should I ask for one section at a time, or all at once?: One at a time for accuracy. All-at-once for speed only when you already know the document and just need a refresh.
  • What about scanned PDFs?: Quality drops on heavy scans. Spot-check OCR; ask Claude to output suspect pages as raw text.
  • Can Claude compare two long documents at once?: Yes, but expect more drift. Give each a clear role in the prompt (“File 1 = old contract, File 2 = new redline”).
  • How do I keep a long document for future sessions?: Put it in a Project (see Claude Files).

Common mistakes

  • “Summarize this 200-page doc in 5 bullets” — you get a confident, plausible, partially wrong summary.
  • Not requesting citations — every claim becomes equally trustworthy and equally suspect.
  • Trusting numbers without spot-checking — LLMs are most confident exactly where they are most likely to be wrong.
  • Asking one question of a 200k-token paste and assuming the answer covers the middle of the document — recall drops in the middle of long inputs unless you pin the question to a section (Claude long-context unstable).
  • Treating Claude’s “I cannot find this” as definitive — sometimes it just missed; rephrase with section anchors.
  • Stopping at the executive summary without verifying its citations against the section drill-downs.

Tags: #Claude #Tutorial