Gemini 3.1 Pro Thinking Stops Mid-Reasoning: Fix

Q: Can I turn thinking off on Gemini 3.1 Pro?

No. Unlike Gemini 2.5 Flash (which disables at `thinking_budget=0`), Gemini 3.1 Pro always thinks. The lowest you can go is `thinking_level="low"`.

Q: It still truncates even at 65536. What now?

You've hit a genuinely large task. Check `finish_reason` — if it's `MAX_TOKENS`, split the work across turns (Step 5) so each turn gets a fresh 64K pool, or lower `thinking_level` to `medium` to spend fewer tokens reasoning and more on the answer.

Gemini 3.1 Pro Thinking truncates mid-thought or returns a short answer with no reasoning. Usually it's thinking level vs output cap. Verified fixes for June 2026.

Published: May 24, 2026 Updated: Jun 21, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You ask Gemini 3.1 Pro a hard math, proof, or coding question with thinking on, and the model either cuts off mid-chain-of-thought or returns a short final answer that obviously skipped the reasoning. In AI Studio you can sometimes watch the thinking tokens run out with no visible answer at all.

Fastest fix (90% of cases): you ran out of output budget. The default maxOutputTokens is 8192, and on Gemini 3.1 Pro the thinking tokens are subtracted from that same budget. A hard problem at the default high thinking level can burn 20K+ tokens reasoning, so 8K leaves nothing for the answer. Raise maxOutputTokens to 65536 (the model ceiling) and you usually get a complete answer immediately. In the consumer app you can’t raise this, so move the actual reasoning step to AI Studio or the API.

One important change since this article first shipped: as of June 2026 the current model is Gemini 3.1 Pro (released February 19, 2026), and it controls reasoning depth with a thinking_level (low / medium / high), not the old numeric thinking_budget from the Gemini 2.5 series. The numeric budget still works on 3.1 Pro as a legacy fallback, but thinking_level is the supported path now.

Which bucket are you in

Symptom	Most likely cause	Go to
Long thinking trace, then a short or cut-off answer	`maxOutputTokens` too low (thinking ate the budget)	Step 1
Thinking trace stops abruptly, model says “let me just commit to…”	Reasoning was capped before it finished	Step 2
Answer is fine but shallow; no real reasoning happened	`thinking_level` too low, or thinking off	Step 3 + Step 6
Works in API/AI Studio but cuts off in the app	Consumer app caps output and gates Deep Think	Step 5
Cuts off only deep in a long chat	Context bloat leaves little room for this turn	Step 4

Common causes (by frequency)

1. maxOutputTokens too low — thinking is eating the budget (most common)

On Gemini 3.1 Pro, thinking tokens and visible-answer tokens come out of the same maxOutputTokens pool. The default is 8192. At the default high thinking level the model can spend 20,000+ tokens reasoning, so 8K is exhausted before it writes a word of the answer.

How to confirm: check response.usage_metadata.thoughts_token_count (API) — if it’s near your maxOutputTokens, the budget was consumed by thinking. In AI Studio the thinking panel is long but the answer box is short or empty.

2. Reasoning capped before it finished

If you forced a low numeric thinking_budget (legacy on 3.1 Pro, or any value on 2.5 Pro), the model hits the cap, is forced to wrap up, and the final answer is short or low quality.

How to confirm: the thinking trace ends abruptly with wrap-up phrasing like “given the budget, I’ll commit to…” or “let me just go with…“.

3. thinking_level too low for the task

low is meant for translation/classification; medium is a balanced daily default; high is full Deep Think Mini reasoning. If you set low on a proof or a multi-file refactor, the model under-thinks and returns a thin answer.

4. Mid-conversation context bloat

A chat with many long prior turns leaves less room for the current turn. Thinking plus answer still has to fit under maxOutputTokens, and a fat context squeezes both.

5. Consumer app caps thinking and gates Deep Think

gemini.google.com caps output far below the API and does not let you raise it. The app exposes a Thinking Level menu (Standard / Extended Thinking / Deep Think), and as of June 2026 Deep Think on Gemini 3.1 Pro is limited to Google AI Ultra ($99.99/mo), while Extended Thinking is free for all users. Standard mode simply does not think hard.

6. Prompt is too open-ended

“Think very deeply about X” with no scope wanders onto tangents and burns budget. A scoped prompt (“solve in 3 phases, then stop and answer”) converges faster and finishes within budget.

Shortest path to fix

Step 1: Raise maxOutputTokens (do this first)

This single change fixes most truncation. The ceiling on Gemini 3.1 Pro is 65536.

In AI Studio:

aistudio.google.com
→ Model: Gemini 3.1 Pro
→ Right panel "Run settings":
    Max output tokens: 65536
    Thinking level: High        (was a numeric "Thinking budget" slider on 2.5)

Because thinking and answer share this pool, giving it the full 64K leaves plenty of room for the answer even when thinking runs long.

Step 2: In the API, set thinking_level + a big output cap

thinking_level (low / medium / high) is the current control on Gemini 3.1 Pro. Default is high, but set it explicitly so you know what you’re getting.

from google import genai
from google.genai import types

client = genai.Client(api_key="YOUR_API_KEY")
response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="Prove that the sum of the first n cubes equals the square of the sum of the first n integers, then verify for n=5.",
    config=types.GenerateContentConfig(
        max_output_tokens=65536,
        thinking_config=types.ThinkingConfig(
            thinking_level="high",
            include_thoughts=True,
        ),
    ),
)

# Diagnostic: how much went to thinking vs answer
print(response.usage_metadata.thoughts_token_count)
print(response.usage_metadata.candidates_token_count)

include_thoughts=True returns a summary of the reasoning so you can see exactly where it stopped. thoughts_token_count tells you whether thinking, not the answer, ate the budget.

If you are still on the older Gemini 2.5 Pro, use the legacy numeric budget instead (thinking_level is not available there):

thinking_config=types.ThinkingConfig(
    thinking_budget=-1,      # -1 = dynamic; valid 2.5 Pro range is 128..32768
    include_thoughts=True,
)

Don’t set both thinking_level and thinking_budget — they conflict.

Step 3: Match thinking_level to the task

low — translation, classification, formatting. Fast and cheap.
medium — most daily reasoning. Good quality/cost balance.
high — proofs, hard algorithms, multi-step debugging. This is Deep Think Mini.

Bumping a stuck low/medium task to high is often all you need when the answer was complete but shallow.

Step 4: Structure the prompt to converge faster

Instead of “think deeply,” scope it:

Solve this in 3 phases:
Phase 1: Restate the problem in your own words (max 100 words).
Phase 2: Outline 2-3 candidate approaches (max 200 words each).
Phase 3: Pick the best approach, execute it fully, verify.

Stop and answer.

Scoped phases let the model commit to one path instead of wandering, so you get more useful reasoning per token.

Step 5: Break very hard problems across turns

If a single problem needs more than one turn’s output budget, split it. Each turn gets a fresh maxOutputTokens pool:

Turn 1: "What are the 3 most promising approaches to <problem>? Pick the best."
Turn 2: "Now execute approach <X> completely. Include the full proof."
Turn 3: "Now verify by an independent check."

Total reasoning depth ends up far higher than one giant prompt, and no single turn truncates.

Step 6: Confirm thinking is actually on

App: click the model name in the prompt box, choose Pro, click the model name again, then Thinking Level, then pick Extended Thinking (free) or Deep Think (Google AI Ultra). Standard does not think hard.
API: omitting ThinkingConfig still defaults to high on 3.1 Pro, but if you copied 2.5-era code with thinking_budget=0, thinking is off — remove it or switch to thinking_level.

Step 7: For app users who must stay in the app

If you need the chat to live in gemini.google.com (e.g., a shared conversation) and thinking keeps cutting off, do the heavy reasoning step in AI Studio at Thinking level: High with Max output tokens: 65536, then paste the conclusion back into the shared chat.

How to confirm it’s fixed

Re-run the same prompt with max_output_tokens=65536.
Print response.usage_metadata.thoughts_token_count and candidates_token_count. Thinking should be well under the total, leaving room for the answer.
Check response.candidates[0].finish_reason — STOP means a clean finish; MAX_TOKENS means you’re still hitting the cap (raise the budget or split the problem).
The visible answer is complete (no mid-sentence cutoff) and shows real reasoning, not a one-line guess.

FAQ

Why does Gemini 3.1 Pro report so many output tokens for a short answer? Because thinking tokens are billed and counted as output. A 200-word answer can carry 15K+ thinking tokens behind it. Check thoughts_token_count to see the split. This is also why your bill and your maxOutputTokens usage look high.

Whatever happened to thinking_budget? My old code uses it. The numeric thinking_budget was the Gemini 2.5 control (range 128..32768, -1 for dynamic). Gemini 3.x replaced it with thinking_level (low/medium/high). thinking_budget still works on 3.1 Pro as a legacy fallback, but Google recommends migrating to thinking_level, and you can’t set both at once.

Can I turn thinking off on Gemini 3.1 Pro? No. Unlike Gemini 2.5 Flash (which disables at thinking_budget=0), Gemini 3.1 Pro always thinks. The lowest you can go is thinking_level="low".

Why does the answer cut off only in the app, not the API? The consumer app caps output and you can’t raise it, and Deep Think on 3.1 Pro is gated to Google AI Ultra. For long math/proofs/code, use AI Studio or the API where you control maxOutputTokens and thinking_level.

It still truncates even at 65536. What now? You’ve hit a genuinely large task. Check finish_reason — if it’s MAX_TOKENS, split the work across turns (Step 5) so each turn gets a fresh 64K pool, or lower thinking_level to medium to spend fewer tokens reasoning and more on the answer.

Prevention

Default max_output_tokens=65536 for any non-trivial reasoning task in the API — thinking shares this pool.
Set thinking_level explicitly (high for hard problems, medium for daily) so you know how deep it’s going.
Use AI Studio or the API for math, proofs, and complex code — the consumer app hides the dials and gates Deep Think behind Google AI Ultra.
Structure prompts as numbered phases to reduce wandering and finish within budget.
Keep include_thoughts=True during development and watch thoughts_token_count — it tells you instantly whether thinking or the answer ran out of room.

External references:

Tags: #Gemini #Troubleshooting #thinking