Gemini 2.5 Output Gets Truncated

Mid-sentence `…` and it stops — max output tokens, safety filter, or stream drop.

Published: May 21, 2026 Updated: Jun 17, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Gemini 2.5 stops mid-sentence with a trailing … or half-word — empirically ~70% is max_output_tokens hit, ~20% is silent safety-filter truncation, ~10% is a frontend stream drop. The symptoms look identical but the fixes differ: “continue” works for the first two, while a dropped stream needs you to treat the cached response as success and resubmit.

To stop the truncation for good, identify the type first, then handle it.

Common causes

By frequency:

1. Output exceeded max_output_tokens (most common)

Gemini 2.5 Flash default cap is 8K output tokens; Pro is 64K. Ask Flash to write a 15K-word report and it will cut off near 8K. AI Studio / API hit the same cap; the Web UI (gemini.google.com) defaults lower, around 4-6K.

How to judge:

API: response finishReason = MAX_TOKENS confirms this
Web: outputs stop at roughly the same word count every time (e.g. always near ~3000 words)

2. Safety filter silent truncation

Gemini sometimes stops mid-output on sensitive content with no error message. Common triggers:

Person names + violent / self-harm / sexual actions
Medical / legal advice
Politically sensitive topics
Fictional scenarios involving minors

How to judge:

API: finishReason = SAFETY or BLOCKLIST
Web: is the topic near the cutoff sensitive?

3. Frontend stream connection dropped

VPN switching, network jitter, or background-tab throttling kills the SSE stream. Output stops very early (< 100 words) and the chat still shows “generating”.

How to judge: browser DevTools → Network shows SSE with connection reset / timeout.

4. Markdown rendering hang

Output completed, but the frontend chokes on long markdown (especially big tables / code blocks). Looks truncated until you refresh.

How to judge: refresh — full content appears.

5. Unsupported language fragment forced fallback

Long outputs that mix obscure scripts (Ancient Greek, rare symbols) can trigger an early-stop fallback.

6. Single output exceeds context window

Gemini 2.5 Pro has 2M input context, 64K output. If your conversation history eats most of the context, the remaining output budget shrinks.

Shortest path to fix

By truncation type:

Step 1: Reply “continue”

Simplest and fastest:

Continue from where you stopped

Gemini usually picks up at the cut. Hits MAX_TOKENS cases. Fails on safety — move to Step 3.

Step 2: API users — raise max_output_tokens

from google import genai
client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="...",
    config=genai.types.GenerateContentConfig(
        max_output_tokens=64000,    # Pro cap
    )
)

Flash caps at 8K, Pro at 64K. Need more length? Switch to Pro.

Step 3: Split into requests (fixes safety filter)

If “continue” fails or the cutoff is near sensitive content:

Split this report into 5 chapters. Output only chapter 1 now, then wait for me to say "continue".

Smaller chunks pass safety filter individually — usually all 5 go through.

Step 4: Diagnose stream connection issues

If Step 3 fails and it’s not safety-related:

Turn off VPN (especially in regions with throttling)
Try another browser (Chrome → Safari)
Close other background tabs (Chrome throttles background SSE)
Use aistudio.google.com (more stable than the chat UI)

Step 5: Markdown rendering issues

Refresh to see if rendering completes
Tell Gemini: “Output plain text, no markdown”
For long tables, break into multiple smaller ones

Step 6: Context overflow — start fresh

If history is 200K+ tokens, new outputs get squeezed:

New conversation
Summarize prior context into a < 5K-token brief
Reuse the brief in the new chat so Gemini has output budget

Step 7: API — auto-handle finishReason

result = client.models.generate_content(...)
finish_reason = result.candidates[0].finish_reason

if finish_reason == "MAX_TOKENS":
    # Auto-continue
    continuation = client.models.generate_content(
        contents=[result.candidates[0].content, "Continue from where you stopped"]
    )
elif finish_reason == "SAFETY":
    # Split request
    ...

Prevention

For long-output tasks, open with: “Split into N sections, wait for continue between each”
Always set max_output_tokens to the model cap (Pro: 64K, Flash: 8K)
For sensitive-topic reports (medical / legal / safety), pre-split into 5 chunks
API users: check finishReason and auto-continue on MAX_TOKENS
When chat history exceeds 100K tokens, start a new conversation to preserve output budget

Tags: #Gemini #Troubleshooting

Common causes

1. Output exceeded max_output_tokens (most common)

2. Safety filter silent truncation

3. Frontend stream connection dropped

4. Markdown rendering hang

5. Unsupported language fragment forced fallback

6. Single output exceeds context window

Shortest path to fix

Step 1: Reply “continue”

Step 2: API users — raise max_output_tokens

Step 3: Split into requests (fixes safety filter)

Step 4: Diagnose stream connection issues

Step 5: Markdown rendering issues

Step 6: Context overflow — start fresh

Step 7: API — auto-handle finishReason

Prevention

Related

Related Articles

Gemini Code Assist IDE Plugin Out of Sync With the Web Model

Gemini Connected Apps (Workspace, Maps, YouTube) Not Triggering

Gemini Gems Not Saving or Disappearing: Fixes

Gmail 'Help Me Write' Drafts Sound Off-Tone or Generic in Gemini

Gemini Image Generation Blocks a Reasonable Prompt as a Safety Violation

Gemini 1M Context Still Truncates Long Documents