Gemini 2.5 stops mid-sentence with a trailing … or half-word — empirically ~70% is max_output_tokens hit, ~20% is silent safety-filter truncation, ~10% is a frontend stream drop. The symptoms look identical but the fixes differ: “continue” works for the first two, while a dropped stream needs you to treat the cached response as success and resubmit.
To stop the truncation for good, identify the type first, then handle it.
Common causes
By frequency:
1. Output exceeded max_output_tokens (most common)
Gemini 2.5 Flash default cap is 8K output tokens; Pro is 64K. Ask Flash to write a 15K-word report and it will cut off near 8K. AI Studio / API hit the same cap; the Web UI (gemini.google.com) defaults lower, around 4-6K.
How to judge:
- API: response
finishReason=MAX_TOKENSconfirms this - Web: outputs stop at roughly the same word count every time (e.g. always near ~3000 words)
2. Safety filter silent truncation
Gemini sometimes stops mid-output on sensitive content with no error message. Common triggers:
- Person names + violent / self-harm / sexual actions
- Medical / legal advice
- Politically sensitive topics
- Fictional scenarios involving minors
How to judge:
- API:
finishReason=SAFETYorBLOCKLIST - Web: is the topic near the cutoff sensitive?
3. Frontend stream connection dropped
VPN switching, network jitter, or background-tab throttling kills the SSE stream. Output stops very early (< 100 words) and the chat still shows “generating”.
How to judge: browser DevTools → Network shows SSE with connection reset / timeout.
4. Markdown rendering hang
Output completed, but the frontend chokes on long markdown (especially big tables / code blocks). Looks truncated until you refresh.
How to judge: refresh — full content appears.
5. Unsupported language fragment forced fallback
Long outputs that mix obscure scripts (Ancient Greek, rare symbols) can trigger an early-stop fallback.
6. Single output exceeds context window
Gemini 2.5 Pro has 2M input context, 64K output. If your conversation history eats most of the context, the remaining output budget shrinks.
Shortest path to fix
By truncation type:
Step 1: Reply “continue”
Simplest and fastest:
Continue from where you stopped
Gemini usually picks up at the cut. Hits MAX_TOKENS cases. Fails on safety — move to Step 3.
Step 2: API users — raise max_output_tokens
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-2.5-pro",
contents="...",
config=genai.types.GenerateContentConfig(
max_output_tokens=64000, # Pro cap
)
)
Flash caps at 8K, Pro at 64K. Need more length? Switch to Pro.
Step 3: Split into requests (fixes safety filter)
If “continue” fails or the cutoff is near sensitive content:
Split this report into 5 chapters. Output only chapter 1 now, then wait for me to say "continue".
Smaller chunks pass safety filter individually — usually all 5 go through.
Step 4: Diagnose stream connection issues
If Step 3 fails and it’s not safety-related:
- Turn off VPN (especially in regions with throttling)
- Try another browser (Chrome → Safari)
- Close other background tabs (Chrome throttles background SSE)
- Use aistudio.google.com (more stable than the chat UI)
Step 5: Markdown rendering issues
- Refresh to see if rendering completes
- Tell Gemini: “Output plain text, no markdown”
- For long tables, break into multiple smaller ones
Step 6: Context overflow — start fresh
If history is 200K+ tokens, new outputs get squeezed:
- New conversation
- Summarize prior context into a < 5K-token brief
- Reuse the brief in the new chat so Gemini has output budget
Step 7: API — auto-handle finishReason
result = client.models.generate_content(...)
finish_reason = result.candidates[0].finish_reason
if finish_reason == "MAX_TOKENS":
# Auto-continue
continuation = client.models.generate_content(
contents=[result.candidates[0].content, "Continue from where you stopped"]
)
elif finish_reason == "SAFETY":
# Split request
...
Prevention
- For long-output tasks, open with: “Split into N sections, wait for
continuebetween each” - Always set max_output_tokens to the model cap (Pro: 64K, Flash: 8K)
- For sensitive-topic reports (medical / legal / safety), pre-split into 5 chunks
- API users: check finishReason and auto-continue on MAX_TOKENS
- When chat history exceeds 100K tokens, start a new conversation to preserve output budget
Related
- Gemini context too short
- Gemini not responding
- Gemini Canvas not loading
- Gemini App Not Syncing Across Android Devices
- Gemini Canvas Revert Not Working
- Gemini Deep Research Hits Timeout on Long Topics
- Gemini Replies in the Wrong Language
- Gemini on Android Not Replacing Assistant
- Gemini Deep Research Returns Thin Results
- Gemini Export / Share Not Working
- Imagen 3 in Gemini Blocks Reasonable Prompts as Safety Violation
- Gemini Video Upload Rejected or File Type Not Supported
- Gemini Gems Not Saving or Disappearing
- Gemini 2.5 Pro Thinking Mode Stops Mid-Reasoning
- Gemini Voice Mode Cuts Off or Doesn’t Hear You
Tags: #Gemini #Troubleshooting