ChatGPT PDF Analysis Not Working — File Reads Empty

Upload a PDF and ChatGPT says it sees nothing or 'No text could be extracted' — almost always scan-vs-text, size, or encryption. Fastest fix inside.

Published: May 17, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Fastest fix: if ChatGPT says it can’t see your PDF or shows No text could be extracted from this file., your PDF is almost certainly a scan with no text layer. Confirm in 30 seconds (open it, Cmd+A / Ctrl+A, copy, paste — if nothing pastes, it’s a scan), then either OCR the file before re-uploading or export the pages as PNG images and upload those instead. The image route works because ChatGPT’s vision model reads pictures far more reliably than it reads image-only PDFs.

What actually happens when you upload a PDF: upload → ChatGPT’s server runs a PDF parser to pull the text layer → that extracted text is stitched into your prompt and sent to the model. As of June 2026, GPT-5.5 also has a vision/OCR fallback that sometimes kicks in when the text layer is empty, but it is inconsistent and tier-dependent — it frequently does not trigger on the web app, especially on Free. So “it can’t read the content” still almost always breaks at the extraction step: no text layer, file too big and truncated, or encryption blocking access.

The five causes below cover 95%+ of cases, in rough order of frequency.

Which bucket are you in?

Symptom you see	Most likely cause	Jump to
`No text could be extracted from this file.` right after upload	Scanned / image-only PDF (no text layer)	Cause 1 / Step 2
Reads early pages, claims later content “isn’t in the file”	Truncated — too large for the token budget	Cause 2 / Step 3
”Cannot read this file” or a password prompt	Encrypted / DRM / copy-restricted	Cause 3 / Step 4
Garbled characters, `\frac{}{}`, scrambled tables	Bad encoding / math-heavy / very old PDF	Cause 4 / Step 5
Worked before, now ignores this file	Too many attachments in one chat	Cause 5 / Step 6

Common causes

1. The PDF is a scan (image only, no text layer)

Scanned contracts, photographed receipts, ebooks built by stuffing images into a PDF — you can see the letters, but there’s no extractable text stream. The parser returns empty. This is the case that produces the literal error No text could be extracted from this file.

As of June 2026 ChatGPT may attempt OCR through its vision model, but on the web app that fallback is unreliable and often skipped, so treat a scanned PDF as something you must fix before uploading.

How to verify: open the PDF in Preview / Acrobat, Cmd+A (Mac) or Ctrl+A (Windows) to select all, copy, paste into Notes or Notepad.

Text appears → it has a text layer; your issue is elsewhere
Blank or garbage → it’s a scan, OCR is required (Step 2)

2. File too large, content gets truncated

ChatGPT’s single-file upload ceiling is 512 MB, but text extraction is bounded by a much tighter token budget (around 2 million tokens of extracted text per file, and your conversation’s overall context, as of June 2026). Rough rule: PDFs over 50 pages frequently get truncated mid-way; 100+ pages almost always have only the first 30–40% read.

When the model never saw the part you asked about, it’ll say “the file doesn’t contain that” — but it just didn’t reach there.

How to verify: ask “List the file’s table of contents / heading hierarchy from start to finish.” If only the first few sections appear, it’s truncation.

3. PDF is encrypted / DRM / copy-restricted

Password-protected PDFs upload fine, but the parser won’t decrypt them — it returns “cannot read.” Some publisher-DRM ebooks behave the same way even without an open password, because copying is disabled.

How to verify: open it locally and watch for a password prompt; or check File → Properties → Security for “password protected” / “copying not allowed.”

4. Bad text encoding / math-heavy / very old PDFs

Some legacy tools embed fonts without proper encoding maps — extraction yields gibberish ASCII. Math papers with LaTeX rendered as glyphs come out as \frac{}{}. Tables with merged cells scramble row order.

How to verify: ask the model to “Quote the first paragraph of page 1 verbatim.” Garbled or missing characters point to an encoding problem.

5. Too many files in one conversation

As of June 2026, ChatGPT Plus allows up to 20 files per message and roughly 80 files per rolling 3-hour window, and the combined extracted text still has to fit the model’s context. Pile several large PDFs into one chat and the earlier ones can get pushed out of context.

How to verify: open a new conversation and upload only this one PDF. If it reads fine, the pileup was the problem.

Shortest path to fix

In time-to-test order.

Step 1: 30-second scan check

Open in Preview / Acrobat → Cmd+A / Ctrl+A → copy → paste into Notes / Notepad.

Blank → it’s a scan, go to Step 2 (OCR)
Text appears → go to Step 3 (size)

Step 2: OCR before uploading (or upload images instead)

Option A — add a text layer with OCR, then re-upload:

macOS Preview: open the PDF → File → Export… → Format PDF → check “Apply OCR” (macOS 14 Sonoma and later)
Web: ilovepdf.com/ocr_pdf or Adobe Acrobat on the web
Professional: ABBYY FineReader (paid, the most accurate for mixed-language scans)
CLI: ocrmypdf input.pdf output.pdf -l chi_sim+eng (open source, strong accuracy; install with brew install ocrmypdf)

Option B — skip OCR entirely and upload page images. Export each page as a PNG (Preview: File → Export…, Format PNG; or any “PDF to image” tool) and upload the images instead of the PDF. ChatGPT’s vision model reads PNG/JPEG far more reliably than image-only PDFs, so this often works on the first try even when the PDF failed. Keep each image under the 20 MB image limit and upload a handful at a time.

OCR isn’t perfect — long formulas, handwriting, and blurry scans may still need a .txt export and a manual proofread.

Step 3: Split / compress

For PDFs over 50 pages:

# macOS: split with pdftk
brew install pdftk-java
pdftk input.pdf cat 1-30 output part1.pdf
pdftk input.pdf cat 31-60 output part2.pdf

Or in Preview: select page thumbnails → right-click → “Export selected pages.”

Upload each chunk separately, summarize chunk by chunk, then ask the model to merge the chunk-level summaries into one final report.

Step 4: Remove the password

macOS Preview: open (enter the password) → File → Export as PDF → uncheck “Encrypt”
Web: smallpdf.com/unlock-pdf (don’t use this for confidential files)
CLI: qpdf --decrypt --password=YOUR input.pdf out.pdf

Step 5: Encoding / layout issues — paste plain text

If extraction can’t yield clean text, last resort:

Preview → select text → copy into a .txt file
Or convert with pdftotext input.pdf output.txt (from poppler), or Pandoc to Markdown
Paste the content directly into the chat box (roughly 4000 characters at a time; split long passages)

The model reads clean plain text far more reliably than messy PDFs.

Step 6: New conversation, single file

If the current chat already has several files attached, start a fresh chat with only this one PDF (see Cause 5 for the per-chat limits).

How to confirm it’s fixed

After re-uploading, don’t ask a broad “summarize this.” Ask a question only the late part of the document can answer — for example, “Quote the last sentence on the final page” or “What is the figure in the last table?” If the model answers correctly, the full text made it through. If it can quote page 1 but not the end, you’re still truncated (go back to Step 3).

Prevention

OCR scanned PDFs locally before upload — don’t wait for the failure
Split 200-page documents into 30–40 page chunks; summarize chunks, then merge
Save key findings to a notes file — a chat can lose context once it gets long
For sensitive contracts / financials, redact before OCR
For repeated long-document work, use Projects so persistent reference files live outside any single chat’s context

FAQ

Why does ChatGPT say “No text could be extracted from this file”? The PDF has no machine-readable text layer — it’s a scan or image-only file. ChatGPT pulled the text layer, found nothing, and stopped. Run OCR, or export the pages as PNG images and upload those (Step 2).

Can ChatGPT read scanned PDFs now in 2026? Sometimes. GPT-5.5 has a vision/OCR fallback, but on the web app it’s inconsistent and tier-dependent — it often doesn’t trigger, especially on Free. The reliable move is still to OCR the file yourself or upload page images rather than the scanned PDF.

ChatGPT only read the first part of my long PDF — why? Text extraction is capped by a token budget, not just the 512 MB file size. Documents past ~50 pages frequently truncate. Split into 30–40 page chunks, summarize each, then merge (Step 3).

What are the current file limits? As of June 2026: single file up to 512 MB, images up to 20 MB, spreadsheets up to 50 MB; extracted text is capped near 2 million tokens; and Plus allows about 20 files per message and ~80 files per rolling 3-hour window.

My PDF has a password. How do I get ChatGPT to read it? Remove the encryption first — open it in Preview and re-export without “Encrypt,” or run qpdf --decrypt --password=YOUR input.pdf out.pdf (Step 4). The parser will not decrypt files for you.

Tags: #ChatGPT #Debug #Troubleshooting

Which bucket are you in?

Common causes

1. The PDF is a scan (image only, no text layer)

2. File too large, content gets truncated

3. PDF is encrypted / DRM / copy-restricted

4. Bad text encoding / math-heavy / very old PDFs

5. Too many files in one conversation

Shortest path to fix

Step 1: 30-second scan check

Step 2: OCR before uploading (or upload images instead)

Step 3: Split / compress

Step 4: Remove the password

Step 5: Encoding / layout issues — paste plain text

Step 6: New conversation, single file

How to confirm it’s fixed

Prevention

FAQ

Related

Related Articles

ChatGPT Advanced Voice Not Available in Your Region: Fixes

ChatGPT Attachments Lost After Refresh: Recover and Prevent

Fix ChatGPT Code Interpreter Sandbox Timeout Mid-Run

ChatGPT Context Window Exceeded in Long Conversations

ChatGPT Ignoring Custom Instructions: Fix It

ChatGPT Export Conversations Failed: Fix the Missing or Empty ZIP