Gemini 1M Context Still Truncating Long Documents

Gemini 2.5 Pro promises 1M tokens but your long doc still gets cut off mid-answer. Usually it's per-message output cap, not input — fix paths inside.

Gemini 2.5 Pro advertises a 1M-token context window. You drop in a 400-page PDF, ask for a section-by-section summary, and the answer cuts off halfway. Or you paste a long codebase and the response only references the first quarter.

The 1M number is real for input — but there’s a separate, much smaller output cap (around 8K tokens by default, up to 65K with thinking budget on 2.5 Pro). And the gemini.google.com app applies tighter limits than AI Studio or the API. Most “1M context lied to me” issues are actually output-cap or wrong-surface issues, not input truncation.

Common causes

By frequency:

1. Output token cap, not input cap (most common)

You can give Gemini 1M tokens of input. But each response is capped — typically 8,192 tokens default, up to 65,536 on Gemini 2.5 Pro with extended thinking. A book-summary-per-chapter ask that totals 30K words can’t fit in one response no matter how much input you gave.

How to judge: response ends abruptly mid-sentence or mid-section. The model “knew more,” it just ran out of output budget.

2. Consumer app has tighter per-message limits than API

gemini.google.com truncates output earlier than AI Studio or API. The 1M context is technically there, but the app’s UI layer caps output around 8K tokens regardless of model.

How to judge: same prompt + same doc gives shorter answer in app vs AI Studio.

3. Per-message token cap on the chat surface

Even with paid tiers, each turn in the gemini.google.com chat has a hidden per-message cap. Very long single-turn prompts get silently trimmed.

How to judge: paste log shows token count near a round number (~32K, ~100K) when it cuts off.

4. Free tier doesn’t get the full 1M

Free Gemini caps context far below 1M — closer to 32K-100K. The 1M is a Pro / AI Studio / API feature.

5. Document was attached as image, not text

If your PDF is scanned (no OCR layer), Gemini may treat it as image input, which counts very differently against token budgets and degrades retrieval.

6. Thinking mode eating output budget

Gemini 2.5 Pro Thinking writes internal reasoning tokens that count against output cap. A long reasoning chain leaves less room for the visible answer.

Shortest path to fix

Step 1: Use the API for serious long-context work

The consumer app is not designed for 1M-token workloads. For real long-context, use the API:

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")
with open("long-doc.pdf", "rb") as f:
    doc = f.read()

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents=[
        types.Part.from_bytes(data=doc, mime_type="application/pdf"),
        "Summarize each chapter in 200 words."
    ],
    config=types.GenerateContentConfig(
        max_output_tokens=65536,
        thinking_config=types.ThinkingConfig(thinking_budget=8000),
    ),
)

This explicitly sets max_output_tokens near the model max and budgets thinking separately.

Step 2: Use AI Studio if you don’t want to script

aistudio.google.com
→ Pick Gemini 2.5 Pro
→ Drop the doc in
→ Under "Run settings" on the right, raise "Max output tokens" to 65536
→ Adjust "Thinking budget" if available

AI Studio exposes the same dials as the API, free for prototyping.

Step 3: Batch the work into multiple turns

If you can’t avoid the consumer app:

Turn 1: "Summarize chapters 1-5 only. Mark when done."
Turn 2: "Now summarize chapters 6-10."
Turn 3: "Now chapters 11-15."

Per-turn output cap applies per turn, so spreading the work across turns produces full coverage without hitting per-message limits.

Step 4: Ask for structured, shorter output

A “200-word summary per chapter” is far more compactable than “comprehensive analysis.” Instruct length explicitly:

For each chapter, output exactly:
- Title
- 3 bullet points (max 15 words each)
- 1 key quote (max 30 words)
Stop after chapter 10.

Tight structure + bullets fit more chapters per output budget than prose.

Step 5: Verify the doc was ingested as text, not image

For PDFs: a text-PDF works. Scanned PDFs need OCR first (Adobe Acrobat OCR, ABBYY, or upload as a Doc that Google has already OCR’d). Image-PDFs eat tokens with low information density.

Step 6: Check your tier

If you’re on free Gemini, you don’t have 1M context — closer to 32K. Upgrade to AI Pro or use AI Studio (free, larger context) to access the real window.

Prevention

  • Default to AI Studio or API for any doc over 50 pages — the consumer app is for quick lookups, not long-doc workflows
  • Always pre-OCR scanned PDFs before uploading
  • For multi-section summaries, instruct exact length per section and chunk across turns rather than asking for everything in one shot
  • For API calls, always set max_output_tokens explicitly — the default 8K is the source of most “cut off” complaints
  • If using thinking mode, budget thinking separately from output so reasoning doesn’t eat your answer

Tags: #Gemini #Troubleshooting #thinking