What actually happens when you upload a PDF to ChatGPT: upload → server runs a PDF parser to extract the text layer → the extracted text gets stitched into the prompt and sent to the model. The model itself does not see pixels, does not run OCR, does not read metadata. So “it can’t read the content” almost always happens at the extraction step — no text layer, file too big and truncated, encryption blocks access.
These 5 causes cover 95%+ of cases in order of frequency.
Common causes
In rough order of frequency:
1. The PDF is a scan (image only, no text layer)
Scanned contracts, photographed receipts, ebooks made by stuffing images into PDFs — you can see the letters, but there’s no extractable text stream. The parser returns empty, the model has nothing to read.
How to verify: open the PDF in Preview / Acrobat, Cmd+A to select all, Cmd+C to copy, paste into Notes.
- Text appears → has a text layer; the issue is elsewhere
- Blank or garbage → it’s a scan, OCR is required
2. File too large, content gets truncated
ChatGPT’s official upload limit is ~512 MB (Plus), but the actual extraction budget is bounded by the per-conversation token limit. Rough rule: PDFs over 50 pages frequently get truncated mid-way; 100+ pages almost always only get the first 30–40% read.
When the model didn’t actually see the part you asked about, it’ll say “the file doesn’t contain that” — but it just didn’t reach there.
How to verify: ask “list the file’s table of contents / heading hierarchy.” If only the first few sections appear, it’s truncation.
3. PDF is encrypted / DRM / copy-restricted
Password-protected PDFs upload fine, but the parser won’t decrypt them — it just returns “cannot read.” Some publisher-DRM ebooks behave the same way.
How to verify: open locally and check for a password prompt; or File → Properties → Security to see “password protected” / “copying not allowed.”
4. Bad text encoding / math-heavy / very old PDFs
Some legacy tools embed fonts without proper encoding maps — extraction yields gibberish ASCII. Math papers with LaTeX rendered as glyphs → output is \frac{}{}. Tables with merged cells → row order scrambles.
How to verify: ask the model to “quote the first paragraph of page 1 verbatim.” If you get garbled or missing characters, it’s encoding.
5. Too many files in one conversation
ChatGPT caps total attachments per chat (~10), and the combined token budget is bounded by the model’s context. After the 6th PDF, the earlier ones may get pushed out of context.
How to verify: open a new conversation, upload only this PDF. If it reads fine, it was the pileup.
Shortest path to fix
In time-to-test order:
Step 1: 30-second check for scanned PDF
Open in Preview → Cmd+A → Cmd+C → paste into Notes.
- Blank → it’s a scan, jump to Step 2 (OCR)
- Text appears → jump to Step 3 (size)
Step 2: OCR before uploading
Free / easy options:
- macOS: drag PDF into Preview → toolbar → “Export As” → “PDF” → check “Apply OCR” (macOS 14+)
- Web: ilovepdf.com/ocr_pdf / Adobe Acrobat web
- Professional: ABBYY FineReader (expensive but the most accurate for mixed-language)
- CLI:
ocrmypdf input.pdf output.pdf -l chi_sim+eng(open source, good accuracy)
Then re-upload the OCR’d file. OCR isn’t perfect — long formulas / handwriting / blurry scans may need a .txt export and manual proofread.
Step 3: Split / compress
For PDFs over 50 pages:
# macOS: split with pdftk
brew install pdftk-java
pdftk input.pdf cat 1-30 output part1.pdf
pdftk input.pdf cat 31-60 output part2.pdf
Or in Preview: select page thumbnails → right-click → “Export selected pages.”
Upload each chunk separately, summarize chunk by chunk, then ask the model to merge the chunk-level summaries into a final report.
Step 4: Remove the password
- macOS Preview: open (enter password) → File → Export as PDF → uncheck “Encrypt”
- Web: smallpdf.com/unlock-pdf (don’t use for confidential files)
- CLI:
qpdf --decrypt --password=YOUR input.pdf out.pdf
Step 5: Encoding / layout issues — paste plain text
If extraction can’t yield clean text, last resort:
- Preview → select text → copy to
.txt - Use Pandoc /
pdftotextto convert to Markdown - Paste content directly into the chat box (under 4000 chars at a time; split long ones)
The model reads plain text far more reliably than messy PDFs.
Step 6: New conversation, single file
If the current chat has many files attached, start a fresh chat with only this PDF.
Prevention
- Run OCR locally for scanned PDFs before upload — don’t wait for it to fail
- Split 200-page documents into 30–40 page chunks; summarize chunks, then merge
- Save important findings to a notes file — chat may lose context after the conversation ends
- For sensitive contracts / financials, redact before OCR
- For long-document analysis, use Projects so persistent reference files don’t eat your single-chat context
Related
- ChatGPT file upload failed
- AI PDF summary use case
- ChatGPT beginner guide
- ChatGPT prompt improvement
- ChatGPT model selection guide
Tags: #ChatGPT #Debug #Troubleshooting