Gemini 2.5 Output Gets Truncated

Mid-sentence `…` and it stops — max output tokens, safety filter, or stream drop.

Gemini 2.5 stops mid-sentence with a trailing or half-word — empirically ~70% is max_output_tokens hit, ~20% is silent safety-filter truncation, ~10% is a frontend stream drop. The symptoms look identical but the fixes differ: “continue” works for the first two, while a dropped stream needs you to treat the cached response as success and resubmit.

To stop the truncation for good, identify the type first, then handle it.

Common causes

By frequency:

1. Output exceeded max_output_tokens (most common)

Gemini 2.5 Flash default cap is 8K output tokens; Pro is 64K. Ask Flash to write a 15K-word report and it will cut off near 8K. AI Studio / API hit the same cap; the Web UI (gemini.google.com) defaults lower, around 4-6K.

How to judge:

  • API: response finishReason = MAX_TOKENS confirms this
  • Web: outputs stop at roughly the same word count every time (e.g. always near ~3000 words)

2. Safety filter silent truncation

Gemini sometimes stops mid-output on sensitive content with no error message. Common triggers:

  • Person names + violent / self-harm / sexual actions
  • Medical / legal advice
  • Politically sensitive topics
  • Fictional scenarios involving minors

How to judge:

  • API: finishReason = SAFETY or BLOCKLIST
  • Web: is the topic near the cutoff sensitive?

3. Frontend stream connection dropped

VPN switching, network jitter, or background-tab throttling kills the SSE stream. Output stops very early (< 100 words) and the chat still shows “generating”.

How to judge: browser DevTools → Network shows SSE with connection reset / timeout.

4. Markdown rendering hang

Output completed, but the frontend chokes on long markdown (especially big tables / code blocks). Looks truncated until you refresh.

How to judge: refresh — full content appears.

5. Unsupported language fragment forced fallback

Long outputs that mix obscure scripts (Ancient Greek, rare symbols) can trigger an early-stop fallback.

6. Single output exceeds context window

Gemini 2.5 Pro has 2M input context, 64K output. If your conversation history eats most of the context, the remaining output budget shrinks.

Shortest path to fix

By truncation type:

Step 1: Reply “continue”

Simplest and fastest:

Continue from where you stopped

Gemini usually picks up at the cut. Hits MAX_TOKENS cases. Fails on safety — move to Step 3.

Step 2: API users — raise max_output_tokens

from google import genai
client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-pro",
    contents="...",
    config=genai.types.GenerateContentConfig(
        max_output_tokens=64000,    # Pro cap
    )
)

Flash caps at 8K, Pro at 64K. Need more length? Switch to Pro.

Step 3: Split into requests (fixes safety filter)

If “continue” fails or the cutoff is near sensitive content:

Split this report into 5 chapters. Output only chapter 1 now, then wait for me to say "continue".

Smaller chunks pass safety filter individually — usually all 5 go through.

Step 4: Diagnose stream connection issues

If Step 3 fails and it’s not safety-related:

  1. Turn off VPN (especially in regions with throttling)
  2. Try another browser (Chrome → Safari)
  3. Close other background tabs (Chrome throttles background SSE)
  4. Use aistudio.google.com (more stable than the chat UI)

Step 5: Markdown rendering issues

  • Refresh to see if rendering completes
  • Tell Gemini: “Output plain text, no markdown”
  • For long tables, break into multiple smaller ones

Step 6: Context overflow — start fresh

If history is 200K+ tokens, new outputs get squeezed:

  1. New conversation
  2. Summarize prior context into a < 5K-token brief
  3. Reuse the brief in the new chat so Gemini has output budget

Step 7: API — auto-handle finishReason

result = client.models.generate_content(...)
finish_reason = result.candidates[0].finish_reason

if finish_reason == "MAX_TOKENS":
    # Auto-continue
    continuation = client.models.generate_content(
        contents=[result.candidates[0].content, "Continue from where you stopped"]
    )
elif finish_reason == "SAFETY":
    # Split request
    ...

Prevention

  • For long-output tasks, open with: “Split into N sections, wait for continue between each”
  • Always set max_output_tokens to the model cap (Pro: 64K, Flash: 8K)
  • For sensitive-topic reports (medical / legal / safety), pre-split into 5 chunks
  • API users: check finishReason and auto-continue on MAX_TOKENS
  • When chat history exceeds 100K tokens, start a new conversation to preserve output budget

Tags: #Gemini #Troubleshooting