Claude Long-Document Research: How to Read a 200-Page PDF

Claude.ai caps PDFs at 100 pages, so a 200-page report needs a split-and-merge workflow. Here is the exact step-by-step for deep, cited analysis (June 2026).

Published: May 17, 2026 Updated: Jun 09, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

TL;DR

You cannot drop a 200-page PDF into Claude.ai and get a clean deep-read. As of June 2026, claude.ai rejects any PDF over 100 pages or 32 MB, and even when a doc fits, a long file pushed into a Project quietly switches to retrieval (RAG) mode and stops reading the whole thing. The reliable workflow is: split the PDF into sub-100-page parts, upload them into one chat (not a Project knowledge base) so Claude reads them in full context, then run a fixed sequence of cited prompts — reading map, per-section extraction, cross-section conflict sweep, weakest-argument audit, critic questions — and finish with a manual read of the pages Claude flags. Use Claude Sonnet 4.6 for the bulk passes and switch to Opus 4.7 for the final critique.

What you are actually fighting

The hard limits matter here, so get them straight before you start (claude.ai, as of June 2026):

Limit	Value	Why it bites
PDF page cap (claude.ai upload)	100 pages	A 200-page report is rejected outright
PDF file size cap	32 MB	Scanned/image-heavy reports hit this fast
Files per single conversation	20	Fine for 2-4 split parts
Chat context window (paid, Opus 4.7 / Sonnet 4.6)	500K tokens	≈ 350-380 dense pages of text held at once
Project knowledge base	Large, but switches to RAG retrieval when it overflows context	Retrieval can miss a contradiction on p.142

Two of these are the whole ballgame. First, the 100-page upload cap means a 200-page document is a non-starter as a single file — you split it. Second, and less obvious: if you load the document into a Project knowledge base, Claude uses it as a retrieval source once it exceeds the active context. Retrieval is great for “find the clause about X” across many files, but it is the wrong mode for “read this one document end to end and catch every internal contradiction.” For a single long doc you want it in the active context of a chat, where Claude reads all of it on every turn.

Rule of thumb: many documents, occasional lookups → Project. One document, close reading → chat upload. This article is the second case.

Who this is for

Analysts, researchers, and legal/policy professionals working through one or two long documents at a time — a whitepaper, an annual report, a court filing, a standards draft — where you need close reading and citation, not a landscape scan across a hundred sources.

Step by step

0. Split the PDF to clear the 100-page cap

A 200-page report must become two ≤100-page files. Use qpdf (free, scriptable) so page numbers stay predictable:

# Split a 200-page PDF into 1-100 / 101-200
qpdf --pages document.pdf 1-100 -- document.pdf doc_part1.pdf
qpdf --pages document.pdf 101-z -- document.pdf doc_part2.pdf

If any part is still over 32 MB (common with scanned filings), shrink it first:

# Downsample images to bring a scanned PDF under 32 MB
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.5 -dPDFSETTINGS=/ebook \
   -dNOPAUSE -dQUIET -dBATCH -sOutputFile=doc_small.pdf document.pdf

Keep your splits aligned to real chapter boundaries where you can — a section straddling the p.100/p.101 break is harder to reason about later.

1. Open one chat, not a Project, and upload all parts

In claude.ai, start a fresh chat. Set the model to Sonnet 4.6 for the heavy extraction passes (faster and cheaper; it carries the same 500K context as Opus on paid plans). Drag in doc_part1.pdf and doc_part2.pdf — up to 20 files fit in one conversation, so 2-4 parts is no problem. Keeping them in a single chat is what guarantees full-context reading instead of retrieval.

Paste your reading contract as the first message:

Your role: senior [domain] document analyst (legal / policy / industry report / academic).

Reading goal: [e.g. "Surface the core claims, weakest arguments, and conflicts
with existing policy in this whitepaper, for a 1500-word brief for decision-makers"]

Audience: [e.g. "me + my boss; decision-makers; 10 minutes of reading each"]

Hard rules:
- Every factual claim cites a page or section: [p.42] or [sec 3.2]
- The PDF is split into part 1 (orig p.1-100) and part 2 (orig p.101-200);
  always cite ORIGINAL page numbers, not the part's internal page
- Use ONLY information inside these files. If an external lookup is needed, stop and tell me
- Never invent numbers, quotes, or page references
- Reply in English

The original-page-number instruction matters: after a split, Claude’s default is to cite the page within each part, which scrambles your references.

2. Build a reading map

Do not ask for a summary yet. First make Claude tell you where to look:

From the uploaded files, produce a reading map. Do NOT summarize content yet.

1. TOC: every level-1 and level-2 heading with original page ranges
2. Length distribution: pages and approx % of total for each level-1 chapter
3. Priority sections: given the reading goal, which 3-5 sections deserve a deep read? Why?
4. Skip candidates: which sections are clearly off-goal and safe to skim?

This costs almost nothing and saves you from deep-reading the boilerplate. Sanity-check the page ranges against the real PDF before trusting them.

3. Per-section deep extraction

For each priority section from step 2:

Deep-process "[section title]" [sec [n], orig p.[start]-[end]]:

1. Core claims in 5 sentences, each ending with [p.[page]]
2. Key data / quotes / dates, each with a page citation
3. Claims that conflict with or restate differently from other sections (name the section)
4. Implicit assumptions (premises the author relies on but never states)
5. The 1-2 weakest arguments here (thin evidence / circular / inferential leap)

Citations must be findable in the PDF. If a quote is not actually there,
write "not found in document" — do not invent.

4. Cross-section conflict sweep

This is where a 500K-context chat earns its keep — Claude can hold both parts at once and compare p.30 against p.170:

Scan BOTH parts as one document. Find every internal contradiction. For each:

- Claim A: [statement] [sec, p.]
- Claim B: [statement that conflicts with A] [sec, p.]
- Conflict type: numeric / position / temporal / definitional
- Severity: high / med / low

Judge consistency ONLY against the document itself, not outside knowledge.
If you find none, say "no obvious internal contradictions found" — do not fabricate.

5. Weakest-argument audit

Read the whole document; list the 5 weakest arguments. For each:

Claim: [author's exact words, with page]
Weakness: thin evidence / single source / circular reasoning / inferential leap /
          definitional shift / cherry-picked data
Why it is weak: [1-2 sentences]
How a reviewer would press it: [1 specific question]

Rank by impact on the conclusion — first those that, if false, sink the main
conclusion even when everything else holds.

For this final critique pass, switch the model to Opus 4.7. In Anthropic’s published numbers it is the stronger reasoner, and the cost difference (API: Opus 4.7 at $5/$25 per 1M in/out vs Sonnet 4.6 at $3/$15) is trivial on a handful of analysis turns.

6. Critic question list

Suppose you are [a peer reviewer / an opposing think-tank analyst / an investigative
reporter] for this document. List 10 sharp questions:

- Each references a specific page
- Each must NOT be answerable by re-reading the document (it needs outside data or fresh analysis)
- No generic "is this comprehensive / objective" — be specific

Pin these answers as your open-questions list and resolve them before drafting the brief.

7. Read the high-leverage pages by hand

Claude has now surfaced the contested, under-supported, and contradictory spots. Open the actual PDF and jump to:

Every “high”-severity conflict from step 4
The pages holding the top-3 weakest arguments from step 5
Any page where step 3 said “not found in document” — Claude may have miscited; verify it yourself

Feed your manual corrections back into the chat and have Claude regenerate the brief. A clean output tree:

claude_longdoc_[topic]/
├── 00_instructions.md
├── 01_reading_map.md
├── 02_per_section_extracts.md
├── 03_cross_section_conflicts.md
├── 04_weakest_arguments.md
├── 05_critic_questions.md
└── 06_final_brief.md

When the document is 500+ pages

At three or more parts, two parts may no longer fit in 500K context together. Two options:

Run the per-section passes per part, then a final merge pass that only loads your step-3 extracts (a few thousand tokens) plus the conflict sweep prompt — small enough to compare everything at once.
Use the API or Claude Code, where Opus 4.7 and Sonnet 4.6 reach a 1M-token window — roughly 750,000 words. That holds a ~700-page document in one context without splitting for retrieval.

Common mistakes

Loading the doc into a Project for close reading. It flips to RAG retrieval and silently stops reading the whole file. Use a chat upload.
Trying to upload the 200-page PDF whole. claude.ai rejects it at the 100-page cap. Split first.
Citing split-part page numbers. Without the original-page instruction, “p.40” in part 2 means original p.140. Pin the mapping up front.
Asking only for “a summary.” A summary smooths over exactly the contradictions and weak links you are paying attention to find.
Treating the contradiction map as final without opening the PDF. Verify every high-severity flag by hand.

FAQ

Can Claude really read a 200-page PDF? Not in one upload on claude.ai — the cap is 100 pages and 32 MB per file (as of June 2026). Split it into ≤100-page parts and put them in one chat. Once split and loaded, Opus 4.7 and Sonnet 4.6 hold up to 500K tokens of chat context, enough to reason across both halves at once.

Why a chat upload instead of a Claude Project? A Project knowledge base is built for many files and lookup-style retrieval; when its content exceeds the active context it switches to RAG, which can miss a contradiction sitting on a single page. For one document you want it fully in the active context of a chat. Many docs and occasional lookups is the Project use case.

Sonnet 4.6 or Opus 4.7? Run the bulk extraction on Sonnet 4.6 — same 500K context on paid plans, faster, cheaper. Switch to Opus 4.7 for the weakest-argument and critic passes where reasoning quality matters most.

Why not just use ChatGPT or Gemini? All three handle long docs now. ChatGPT Plus reads roughly 320 pages of in-app context (full 1M only on the $200 Pro tier), and Gemini 3.1 Pro carries 1M tokens. Claude’s edge here is the structured, heavily-cited extraction the prompts above lean on; for true 1M-token single-context reads without splitting, Claude via API/Claude Code or Gemini 3.1 Pro are the better fit.

Which Claude plan do I need? Claude Pro ($20/mo, $17 annual) covers most long-doc work and includes the 500K chat context on Opus 4.7 and Sonnet 4.6. Step up to Max ($100/$200) only if you hit usage caps on long sessions.

Tags: #Tutorial #Research #Claude #Long document

TL;DR

What you are actually fighting

Who this is for

Step by step

0. Split the PDF to clear the 100-page cap

1. Open one chat, not a Project, and upload all parts

2. Build a reading map

3. Per-section deep extraction

4. Cross-section conflict sweep

5. Weakest-argument audit

6. Critic question list

7. Read the high-leverage pages by hand

When the document is 500+ pages

Common mistakes

FAQ

Related

Related Articles

AI Competitive Research Tutorial: 5 Competitors in 30 Minutes

AI Historical Archive Research: A Primary-Sources-First Workflow

AI Market Sizing Tutorial: TAM/SAM/SOM From Top-Down + Bottom-Up

AI Systematic Literature Review Tutorial Without Hallucination

How to Check AI Citations and Sources: A 4-Pass Verification Workflow

AI Fact-Check Workflow: Verify a Claim in 3 Minutes