What this covers
Claude’s 200K context window is the right tool for deep analysis of long PDFs / reports — here’s the workflow that uses it well.
Key tools and concepts:
- Claude: Anthropic’s conversational AI assistant (similar in role to ChatGPT) with file, long-context, and tool support.
Who this is for
Analysts, researchers, legal/policy professionals dealing with one-or-two long documents at a time.
When to reach for it
When the source is one big document (50-500 pages) and you need close reading, not landscape scan.
Step by step
-
Create the Project. claude.ai → sidebar
+ New project. Name it after the deliverable, e.g.Q1 policy whitepaper review. Click into the project →Custom instructions, paste:Your role: senior <domain> document analyst (legal / policy / industry report / academic). Reading goal: <e.g. "Surface the core claims, weakest arguments, and conflicts with existing policy from this whitepaper, into a 1500-word brief for decision-makers"> Audience: <e.g. "me + my boss, decision-makers, 10 minutes of reading time each"> Hard rules: - Every factual claim must cite page or section, format [p.42] or [§3.2] - Do NOT use information outside the document (if external lookup is needed, stop and tell me) - Do NOT hallucinate numbers, quotes, or page references - Reply in English -
Upload the long document. Project header →
+ Add content→ upload PDF / DOCX / MD. Claude’s current context: ~100K tokens free, 200K tokens Pro / Team (≈ 400-500 pages of dense PDF). If over budget:- Trim the appendix, references, and front matter that don’t serve the goal
- Or split with
pdftk/qpdfand upload halves to two parallel projects
# Split a 500-page PDF into 1-250 / 251-500 qpdf --pages document.pdf 1-250 -- document.pdf doc_part1.pdf qpdf --pages document.pdf 251-z -- document.pdf doc_part2.pdf -
Reading map — TOC + section lengths. Open a new chat:
Based on the uploaded document, produce a reading map: 1. TOC: all level-1 and level-2 headings with page ranges 2. Length distribution: pages and approximate % of total for each level-1 chapter 3. Priority candidates: given the reading goal, which 3-5 sections deserve deep reading? Why? 4. Skip candidates: which sections are clearly off-goal and safe to skim? Do NOT summarize content — this step only draws the map. -
Per-section deep extraction. For each priority section from step 3:
Deep-process "<section title>" [§<section number>, p.<start>-<end>]: 1. Summarize the core claims in 5 sentences, each ending with [p.<page>] 2. List key data / quotes / dates with page citations 3. List claims that conflict with or restate differently from other sections (name which section) 4. List the implicit assumptions (premises the author relies on but never states) 5. List the 1-2 weakest arguments in this section (thin evidence / circular / inferential leap) Citations must be real and findable in the PDF; if a quote isn't actually in the doc, say "not found in document". Don't invent. -
Cross-section conflict sweep:
Scan the entire document (not just one section): Find all internal contradictions, output each as: - Claim A: <specific statement> [§<sec>, p.<page>] - Claim B: <statement that conflicts with A> [§<sec>, p.<page>] - Conflict type: numeric / position / temporal / definitional - Severity: high / med / low Do NOT judge conflicts using outside knowledge — only the document's internal consistency. If you find nothing, say "no obvious internal contradictions found" — do not fabricate. -
Weakest-argument audit:
Read the whole doc; list the 5 weakest arguments. For each: Claim: <quote the author's exact words, with page> Weakness type: thin evidence / single source / circular reasoning / inferential leap / definitional shift / cherry-picked data Analysis: <1-2 sentences on why this is weak> How a reviewer would press: <1 specific question> Rank by "impact on the conclusion" — list first the ones that, if false, invalidate the main conclusion even if everything else holds. -
Critic question list:
Suppose you are <a peer reviewer / an opposing-side think tank analyst / an investigative reporter> for this document. List 10 sharp questions: - Each question must reference a specific page - Questions must NOT be answerable by re-reading the document (they must require outside data or fresh analysis) - Avoid generic "is this comprehensive / objective" — be specificPaste these 10 questions into a pinned chat in the project named
Critic questions — open. Resolve them yourself in the next phase, before drafting the brief. -
Read the high-leverage pages by hand. Claude has already surfaced the “contested / under-supported / internally contradictory” sections — your job is to open the PDF and jump to those page numbers:
- All “high”-severity conflicts from step 5
- The pages holding the top-3 weakest arguments from step 6
- Pages where step 4 said “not found in document” — Claude may have miscited; verify
Feed the manual corrections back to Claude to update the brief. Final outputs go into:
claude_longdoc_2026_05_21_<topic>/ ├── 00_instructions.md ├── 01_reading_map.md ├── 02_per_section_extracts.md ├── 03_cross_section_conflicts.md ├── 04_weakest_arguments.md ├── 05_critic_questions.md └── 06_final_brief.md
Recommended workflow
Project setup → upload → ToC + length map → per-section summary + contradiction marks → cross-section contradiction pass → weakest-argument prompt → critic questions → manual deep read.
Common mistakes
- Trying this in regular chat — context limits will burn you
- Asking only for “a summary” — that buries the contradictions
- Treating the contradiction map as final without manual read
Practical depth notes
For Claude Long-Document Research Workflow: When the PDF Is 200 Pages, treat the workflow as a small controlled run before trusting it on real work. Start with one representative input, define what a good result must include, and keep the original beside the AI output so you can see what changed. The model should explain tradeoffs, assumptions, and weak spots instead of only producing a cleaner-looking answer.
The safest review pattern is: run once for structure, once for quality, and once for risks. Check facts, names, numbers, links, file paths, and commands manually. If the output affects users, money, legal terms, production code, or published claims, keep a human approval step even when the draft looks confident.
FAQ
- Why not just ChatGPT?: Claude’s 200K context handles single long docs better. ChatGPT shines on multi-source workflows.
- Best plan?: Claude Pro covers most long-doc work. Team plans get Projects and shared chats.