ChatGPT stopping mid-sentence is different from ChatGPT being stuck or rate-limited. Cut-off means tokens streamed fine, then the model just stopped — sometimes mid-word, sometimes at a clean paragraph break. The fastest unblocker is almost always the same: type continue and hit send. But knowing why it stopped lets you stop it from happening on the next long task.
In rough order of frequency: max output token cap → model decided it was done → stream interrupted → safety filter mid-response.
Symptoms
- Output ends mid-word, mid-code block, or mid-list item
- Long answers (code, translations, table dumps) reliably stop around the same length
- “Stop generating” never appeared and you didn’t click anything
- The model says “I’ll continue in the next message” then doesn’t
- A code block opens with
```but never closes
Common causes
1. Hit the per-response max output tokens
Every model has a per-response output cap that’s smaller than its context window. GPT-5.5 ships with roughly 8k–16k output tokens per turn depending on tier; o-series reasoning models use the same budget but spend a chunk on hidden thinking, so the visible portion is shorter. When the generation hits that ceiling, it stops — no error, just silence.
How to verify: count the output. If a code dump or translation reliably ends around the 8k-token mark (≈ 32 KB of English, ≈ 6,000 Chinese characters), this is it.
2. The model thought it finished
The model picks an end-of-message token when it judges the answer “complete enough.” On long structured outputs (tables, numbered lists, multi-file code), it sometimes truncates a list early and emits the stop token. This is a quality-of-completion problem, not a cap.
How to verify: the output ends at what looks like a natural break (last item of a list, end of a function) but the prompt asked for more. Asking “did you finish?” usually gets “no, here’s the rest.”
3. Network stream dropped mid-response
The browser holds a Server-Sent Events stream open while tokens arrive. On a flaky connection, VPN handoff, or sleeping laptop, the SSE connection drops; the UI shows whatever already streamed and stops. The server-side generation may even have completed — you just stopped receiving the tail.
How to verify: DevTools → Network → look for the conversation SSE request. Status closed early or net::ERR_NETWORK_CHANGED confirms a stream drop.
4. Safety filter triggered mid-response
Less common, but real: the model started a response, generated content that tripped a post-filter, and the response was cut. The UI usually shows a “this content may violate our policies” banner — but on mobile the banner is easy to miss.
How to verify: scroll back. If there’s an orange/red warning above the truncated response, this was the cause.
5. Browser tab backgrounded or throttled
Some browsers throttle JavaScript timers in background tabs. The SSE connection survives but rendering pauses, and on some builds the connection closes after a long pause. Comes back when the tab is focused — but anything that wasn’t rendered is gone.
How to verify: did the cut-off happen after you switched away from the tab? Same prompt with the tab in focus completes normally.
6. Custom GPT system prompt forced an early stop
If you’re inside a Custom GPT whose instructions include things like “keep responses under 300 words” or “always end with the next-step question,” the GPT may stop earlier than a vanilla chat would.
How to verify: try the same prompt in a regular ChatGPT chat (no Custom GPT). Full output = the GPT’s system prompt was the cap.
Shortest path to fix
Step 1: Type continue and send
The single most reliable fix. ChatGPT will pick up from where it stopped — usually completing the next paragraph, code block, or list. For code it sometimes restarts the current block; for prose it usually continues cleanly. Works for cases 1, 2, 3, 5, 6.
For code specifically: continue from where you stopped, without repeating any lines.
Step 2: For long outputs, split before you start
Don’t ask for “the entire 30-file refactor in one response.” Ask for files 1–3 first, then 4–6. Pre-splitting:
- Avoids the output cap entirely
- Gives you a recovery point if any chunk fails
- Reduces per-call prefill time too
For translations: split per chapter or per 2,000 source words. For tables: ask for 30 rows at a time.
Step 3: Switch off Custom GPT if you’re in one
Custom GPT context shows above the input box. Click “New chat” → leave it as default ChatGPT. Same prompt now uses the full output budget without the GPT’s instructions trimming it.
Step 4: Check for the safety-filter banner
Scroll up to the truncated message. If there’s an orange/red policy warning, rephrase to remove the trigger (often a specific name, a piece of code labeled “exploit,” or a phrasing that pattern-matches sensitive content) before retrying.
Step 5: Stabilize network for long generations
- Plug into ethernet instead of Wi-Fi for 20-minute generations
- Disable VPN auto-reconnect during the request
- Keep the tab focused — don’t switch desktops or sleep the laptop
- On mobile, prevent the screen from locking
Step 6: If it’s reasoning-model output, lower thinking budget
o-series reasoning models burn output tokens on hidden thinking. If you don’t need deep reasoning, switch to GPT-5.5 — the visible output is longer because none of the budget went to thinking.
Easy to misdiagnose as
- Rate limit / message cap — that one shows a banner (“You’ve reached your limit”). Cut-off shows no banner. See ChatGPT message cap.
- Stuck loading — stuck means no tokens ever came out. Cut-off means tokens streamed fine, then stopped. See ChatGPT stuck on loading.
- Slow response — slow means it’s still going. Cut-off means it definitely stopped. See ChatGPT slow response.
Prevention
- For any output you expect to be > 3,000 words, plan a 2–3 message split up front
- Long code dumps: ask for one file at a time, not a whole project
- Run long generations on a stable network — ethernet beats hotel Wi-Fi
- Don’t background the tab for the entire generation
- If you keep hitting cut-offs in the same Custom GPT, edit its instructions to remove length caps
- Save reasoning models for tasks that actually need them — they have less budget left for visible output
Related
- ChatGPT message cap
- ChatGPT stuck on loading
- ChatGPT slow response
- ChatGPT network error
- ChatGPT large document incomplete analysis
Tags: #ChatGPT #Debug #Troubleshooting