Google advertises Gemini 2.5 Pro’s 2M-token context window, but uploading a 100-page PDF on gemini.google.com gets you “exceeds limit” — not false advertising; the consumer Web UI’s real window is 20-60× smaller than the API. Free Web is roughly 32K-100K depending on model and plan, vs the API’s 2M.
To actually use long context, understand the Web-vs-API gap and route to the right surface per task.
Common causes
By frequency:
1. Flash / Lite model has a smaller effective window (most common)
Real effective windows:
| Model | API cap | gemini.google.com effective |
|---|---|---|
| Gemini 2.5 Pro | 2M | ~100K (Advanced) / ~32K (free) |
| Gemini 2.5 Flash | 1M | ~32K-64K |
| Gemini Lite | 32K | ~16K |
Web UI doesn’t tell you it’s compressed — just refuses.
How to judge: top model picker.
2. Attachments billed by source bytes, not preview size
A 100-page PDF (image-heavy):
- Display size: 5 MB
- Actual context cost: parsed text + OCR’d image tokens, possibly 200K+
100-page PDF average = 30K-60K tokens (text-only); mixed text+image = 80K-200K tokens.
How to judge: upload triggers “content too large” — this is it.
3. Free tier capped tighter than the public number
Free users get roughly 1/3 - 1/2 the effective window of paid Advanced users. Google doesn’t publish exact numbers but it’s measurable.
How to judge:
- Account at one.google.com/about/ai-premium shows “Free” / not “AI Premium” / “AI Pro”
- Same file uploads fine on a paid account
4. Conversation history filled the context
Web counts all prior turns in the current conversation. If you’ve already loaded several PDFs, new content fights for what’s left.
How to judge: current conversation has many turns or multiple file uploads.
5. Workspace-managed accounts have stricter caps
Corporate Workspace can set a low context cap (data-exfil prevention).
How to judge:
- Personal uploads fine, work doesn’t = this
- Confirm with IT in Admin Console → Gemini app settings
6. Scanned PDFs / image-heavy docs
Scanned PDFs have no text layer; Gemini treats each page as an image, ~1-2K tokens per page (image encoding) → 100-200K tokens for 100 pages.
Shortest path to fix
By context size unlocked, cheapest first:
Step 1: Switch to Gemini 2.5 Pro
gemini.google.com → top model picker → "Gemini 2.5 Pro"
Pro gives 2-3× the effective Web window vs Flash / Lite. Pro is the only Web model that approximates “long context”.
Step 2: Upgrade to Google AI Premium
one.google.com/about/ai-premium
Subscribe to Google AI Premium (includes Gemini Advanced)
After upgrade, Pro on Web jumps from ~32K to ~100K+; $19.99/month.
Step 3: Split large documents
100-page PDF → split into 50-page batches:
# pdftk
pdftk input.pdf cat 1-50 output batch1.pdf
pdftk input.pdf cat 51-100 output batch2.pdf
# Or macOS Preview / Adobe Acrobat / online tools like ilovepdf.com
Workflow:
- Upload batch1.pdf → “summarize, output 1K-word brief”
- Copy the brief
- New conversation: upload batch2.pdf + brief → ask Gemini to merge
- Repeat across all batches
Step 4: OCR scanned PDFs before uploading
For scanned docs:
# ocrmypdf (open source)
ocrmypdf input.pdf output_ocr.pdf
# Or Adobe Acrobat → Tools → Scan & OCR
After OCR, the text layer is read as plain text; token count drops from 200K to 30-60K.
Step 5: Convert PDF to plain text / markdown
# pdftotext
pdftotext input.pdf output.txt
# Feed Gemini the .txt
Plain text has the lowest token overhead. 1MB text ≈ 250K tokens, fits Pro’s effective Web window.
Step 6: Use the Gemini API (real long context)
If you depend on long context routinely:
from google import genai
client = genai.Client(api_key="YOUR_KEY")
# Files API for large uploads
file = client.files.upload(path="huge_doc.pdf")
response = client.models.generate_content(
model="gemini-2.5-pro",
contents=[file, "Summarize key points"]
)
API gives you the full 2M window, charged per token but cheap (input ~$1.25/M tokens).
Or use Google AI Studio, the API’s free Web UI:
- Uploads 100-page PDF fine
- Effective window ~1.5M tokens
- Free (with rate limits)
Step 7: Start a new conversation to free history
If your current chat is bloated:
- New conversation
- Summarize prior context into a < 5K-token brief
- Reuse the brief + new file in the new chat
Step 8: Workspace — ask IT to raise the cap
Work account restricted: IT can adjust Admin Console → Gemini app for Workspace → “Maximum file size” / “Maximum context tokens”.
Prevention
- Long-context work belongs in aistudio.google.com, not gemini.google.com — 10× larger window for free
- OCR scanned PDFs first; saves ~70% of tokens
- For text-only tasks, convert to .txt instead of uploading PDFs
- Upgrade to Google AI Premium for ~100K window on Pro
- Heavy long-doc work (papers / 10-Ks / codebases) — use the Gemini API for the full 2M window
Related
Tags: #Gemini #Troubleshooting