Gemini 1M Context Still Truncates Long Documents

Gemini 3.1 Pro promises 1M tokens but your long doc gets cut off mid-answer. It is almost always the 8K output cap, not input. Fixes inside (as of June 2026).

Published: May 24, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Gemini 3.1 Pro advertises a 1M-token context window. You drop in a 400-page PDF, ask for a section-by-section summary, and the answer cuts off halfway. Or you paste a long codebase and the response only references the first quarter.

Fastest fix: the 1M number is for input. Each response is capped separately, and the default cap is only 8,192 output tokens even on Gemini 3.1 Pro (the max is 65,536). If you are in the API or AI Studio, raise max_output_tokens to 65536. If you are in the consumer Gemini app, split the work across multiple turns and ask for tight, structured output. Most “1M context lied to me” reports are an output-cap or wrong-surface problem, not input truncation.

This page is current as of June 2026 (Gemini 3.1 Pro shipped February 19, 2026; the Gemini 2.5 generation has been retired from the app).

Which bucket are you in?

Match your symptom to the cause, then jump to the matching fix.

Symptom	Most likely cause	Fix
Answer stops mid-sentence, model “knew more”	Output token cap (8,192 default)	Step 1 / Step 2
Same prompt is shorter in the app than in AI Studio	Consumer-app output limits	Step 1 / Step 3
Long single prompt seems silently trimmed	Per-message / per-tier context cap	Step 6
You are on Free Gemini	32K context, not 1M	Step 6
Scanned PDF, retrieval is poor	Doc ingested as image, not text	Step 5
”Thinking” runs long, visible answer is tiny	Reasoning tokens eat the output budget	Step 1 / Step 4

Common causes

1. Output token cap, not input cap (most common)

You can feed Gemini up to 1M tokens of input. But each response is capped separately. As of June 2026 the default is 8,192 output tokens, and the model max for Gemini 3.1 Pro is 65,536. A “summarize every chapter” ask that totals 30K words cannot fit in one response no matter how much input you supplied.

How to judge: the response ends abruptly mid-sentence or mid-section, and the model clearly “knew more” — it just ran out of output budget. In code-heavy answers the app sometimes shows the explicit notice: A code sample in this response was truncated because it exceeded the maximum allowable output.

2. The consumer app caps output earlier than AI Studio or the API

gemini.google.com truncates output sooner than AI Studio or the API. The 1M context is technically present, but the app’s UI layer does not let you raise max_output_tokens — only the API and AI Studio expose that dial.

How to judge: the same prompt + same doc gives a noticeably shorter answer in the app than in AI Studio.

3. Per-message and compute-based limits on the chat surface

The app also has per-turn limits, and in 2026 Google moved Gemini Apps to a compute-based usage model: each turn is scored by prompt complexity, the model/tools used, and chat length, with a 5-hour refresh window until a weekly cap is hit. Very long single-turn prompts can be trimmed or rejected.

How to judge: the answer cuts off near a round token count, or you hit a “you’ve reached your limit” message after a few heavy long-doc turns.

4. Free tier does not get the full 1M

Free Gemini caps context at 32K tokens (per Google’s own Gemini Apps limits page). The full 1M window is a Google AI Pro ($19.99/mo, formerly “Gemini Advanced”) or AI Ultra ($99.99/mo) feature, and is also available in AI Studio and the API.

5. The document was attached as image, not text

If your PDF is scanned (no OCR text layer), Gemini treats each page as an image. Image pages cost tokens differently and retrieve far worse than real text, so long scans degrade fast.

6. Thinking is eating the output budget

Gemini 3.1 Pro emits internal reasoning tokens that count against the same output budget as the visible answer. At the HIGH thinking level roughly 18,000-30,000 tokens can go to reasoning, leaving the visible answer cramped. Lowering the thinking level recovers output room.

Shortest path to fix

Step 1: For serious long-context work, use the API

The consumer app is not built for 1M-token workloads. Use the API and set the output cap and thinking level explicitly. The current model string is gemini-3.1-pro-preview.

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")

with open("long-doc.pdf", "rb") as f:
    doc = f.read()

response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents=[
        types.Part.from_bytes(data=doc, mime_type="application/pdf"),
        "Summarize each chapter in 200 words.",
    ],
    config=types.GenerateContentConfig(
        max_output_tokens=65536,
        thinking_config=types.ThinkingConfig(thinking_level="low"),
    ),
)
print(response.text)

Two things matter here: max_output_tokens=65536 raises the answer cap to the model max, and thinking_level="low" keeps reasoning from swallowing that budget. Valid thinking levels for 3.1 Pro are "low", "medium", and "high" ("minimal" exists only on the Flash models, not 3.1 Pro); the API defaults to "high" if you omit it, so set it deliberately. Always check response.candidates[0].finish_reason — MAX_TOKENS confirms you were truncated by the output cap and should raise it or split the task.

Step 2: Use AI Studio if you do not want to script

AI Studio exposes the same dials as the API, free for prototyping.

aistudio.google.com
-> Model: Gemini 3.1 Pro
-> Drop the doc into the prompt
-> Open "Run settings" on the right
-> Raise "Max output tokens" to 65536
-> Set "Thinking" to Low or Medium for long answers

Note: a single PDF uploaded directly is limited to roughly 1,000 pages and a few MB; split bigger files before uploading.

Step 3: Batch the work into multiple turns

If you must stay in the consumer app, the per-turn output cap resets each turn, so spread the work out:

Turn 1: "Summarize chapters 1-5 only. Say DONE when finished."
Turn 2: "Now summarize chapters 6-10."
Turn 3: "Now chapters 11-15."

Each turn gets its own output budget, so chunking produces full coverage that a single mega-prompt cannot.

Step 4: Ask for structured, shorter output

A “200-word summary per chapter” compresses far better than “comprehensive analysis.” Instruct length explicitly:

For each chapter, output exactly:
- Title
- 3 bullet points (15 words max each)
- 1 key quote (30 words max)
Stop after chapter 10.

Tight structure and bullets fit more chapters per output budget than free-form prose.

Step 5: Confirm the doc was ingested as text, not image

For PDFs, a real text PDF works directly. Scanned PDFs need OCR first (Adobe Acrobat’s OCR, ABBYY, or re-saving as a Google Doc that has been OCR’d). To test quickly, ask Gemini to quote an exact sentence from page 50 — if it cannot, the page is likely an image, not text.

Step 6: Check your tier and the right surface

If you are on Free Gemini, you have a 32K context, not 1M. Upgrade to Google AI Pro for the 1M window in the app, or use AI Studio (free, full window, adjustable output cap) for long-doc work. For anything over ~50 pages, prefer AI Studio or the API over the chat app.

How to confirm it is fixed

API: print response.candidates[0].finish_reason. STOP means the model finished on its own; MAX_TOKENS means you are still capped — raise max_output_tokens or lower the thinking level.
AI Studio: the token counter under the prompt shows output tokens; a complete answer ends with a natural conclusion, not mid-sentence.
Coverage check: ask “How many chapters did you cover, and which is last?” If the count matches the document, nothing was dropped.

Prevention

For any document over ~50 pages, default to AI Studio or the API. The chat app is for quick lookups, not long-doc workflows.
In API calls, always set max_output_tokens explicitly. The 8,192 default is the single biggest source of “cut off” complaints.
Set thinking_level deliberately (low or medium for long answers) so reasoning does not consume the visible output.
Pre-OCR scanned PDFs, and keep each uploaded PDF under ~1,000 pages.
For multi-section summaries, specify exact length per section and chunk across turns instead of one giant prompt.

External references: Gemini Apps limits (Google support) and the Gemini 3 developer guide.

Tags: #Gemini #Troubleshooting #thinking