Gemini Context Window Feels Shorter Than Advertised

1M token is the headline, but 100 pages already overflow — plan + model differences.

Google advertises Gemini 2.5 Pro’s 2M-token context window, but uploading a 100-page PDF on gemini.google.com gets you “exceeds limit” — not false advertising; the consumer Web UI’s real window is 20-60× smaller than the API. Free Web is roughly 32K-100K depending on model and plan, vs the API’s 2M.

To actually use long context, understand the Web-vs-API gap and route to the right surface per task.

Common causes

By frequency:

1. Flash / Lite model has a smaller effective window (most common)

Real effective windows:

ModelAPI capgemini.google.com effective
Gemini 2.5 Pro2M~100K (Advanced) / ~32K (free)
Gemini 2.5 Flash1M~32K-64K
Gemini Lite32K~16K

Web UI doesn’t tell you it’s compressed — just refuses.

How to judge: top model picker.

2. Attachments billed by source bytes, not preview size

A 100-page PDF (image-heavy):

  • Display size: 5 MB
  • Actual context cost: parsed text + OCR’d image tokens, possibly 200K+

100-page PDF average = 30K-60K tokens (text-only); mixed text+image = 80K-200K tokens.

How to judge: upload triggers “content too large” — this is it.

3. Free tier capped tighter than the public number

Free users get roughly 1/3 - 1/2 the effective window of paid Advanced users. Google doesn’t publish exact numbers but it’s measurable.

How to judge:

4. Conversation history filled the context

Web counts all prior turns in the current conversation. If you’ve already loaded several PDFs, new content fights for what’s left.

How to judge: current conversation has many turns or multiple file uploads.

5. Workspace-managed accounts have stricter caps

Corporate Workspace can set a low context cap (data-exfil prevention).

How to judge:

  • Personal uploads fine, work doesn’t = this
  • Confirm with IT in Admin Console → Gemini app settings

6. Scanned PDFs / image-heavy docs

Scanned PDFs have no text layer; Gemini treats each page as an image, ~1-2K tokens per page (image encoding) → 100-200K tokens for 100 pages.

Shortest path to fix

By context size unlocked, cheapest first:

Step 1: Switch to Gemini 2.5 Pro

gemini.google.com → top model picker → "Gemini 2.5 Pro"

Pro gives 2-3× the effective Web window vs Flash / Lite. Pro is the only Web model that approximates “long context”.

Step 2: Upgrade to Google AI Premium

one.google.com/about/ai-premium
Subscribe to Google AI Premium (includes Gemini Advanced)

After upgrade, Pro on Web jumps from ~32K to ~100K+; $19.99/month.

Step 3: Split large documents

100-page PDF → split into 50-page batches:

# pdftk
pdftk input.pdf cat 1-50 output batch1.pdf
pdftk input.pdf cat 51-100 output batch2.pdf

# Or macOS Preview / Adobe Acrobat / online tools like ilovepdf.com

Workflow:

  1. Upload batch1.pdf → “summarize, output 1K-word brief”
  2. Copy the brief
  3. New conversation: upload batch2.pdf + brief → ask Gemini to merge
  4. Repeat across all batches

Step 4: OCR scanned PDFs before uploading

For scanned docs:

# ocrmypdf (open source)
ocrmypdf input.pdf output_ocr.pdf

# Or Adobe Acrobat → Tools → Scan & OCR

After OCR, the text layer is read as plain text; token count drops from 200K to 30-60K.

Step 5: Convert PDF to plain text / markdown

# pdftotext
pdftotext input.pdf output.txt

# Feed Gemini the .txt

Plain text has the lowest token overhead. 1MB text ≈ 250K tokens, fits Pro’s effective Web window.

Step 6: Use the Gemini API (real long context)

If you depend on long context routinely:

from google import genai

client = genai.Client(api_key="YOUR_KEY")

# Files API for large uploads
file = client.files.upload(path="huge_doc.pdf")

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=[file, "Summarize key points"]
)

API gives you the full 2M window, charged per token but cheap (input ~$1.25/M tokens).

Or use Google AI Studio, the API’s free Web UI:

  • Uploads 100-page PDF fine
  • Effective window ~1.5M tokens
  • Free (with rate limits)

Step 7: Start a new conversation to free history

If your current chat is bloated:

  1. New conversation
  2. Summarize prior context into a < 5K-token brief
  3. Reuse the brief + new file in the new chat

Step 8: Workspace — ask IT to raise the cap

Work account restricted: IT can adjust Admin Console → Gemini app for Workspace → “Maximum file size” / “Maximum context tokens”.

Prevention

  • Long-context work belongs in aistudio.google.com, not gemini.google.com — 10× larger window for free
  • OCR scanned PDFs first; saves ~70% of tokens
  • For text-only tasks, convert to .txt instead of uploading PDFs
  • Upgrade to Google AI Premium for ~100K window on Pro
  • Heavy long-doc work (papers / 10-Ks / codebases) — use the Gemini API for the full 2M window

Tags: #Gemini #Troubleshooting