ChatGPT being slow and ChatGPT being stuck are different things. Slow = tokens drip out but eventually complete. Stuck = the request fires, nothing comes back for 30+ seconds. This article is about slow — in rough order of likelihood: long conversation → heavy model → network latency → server queue.
Why slow happens mechanically: every turn, the server concatenates your full history + the new prompt and sends it all to the model. The model runs prefill (process input) then decode (generate output). Longer input = longer prefill; heavier model = slower per token; higher network RTT = streamier-looking stalls.
Symptoms
- Tokens trickle one at a time
- “Thinking” indicator hangs 20+ seconds before output starts
- Earlier turns in the same thread were faster
- App slower than web (or vice versa)
- Same prompt fast in the morning, slow in the afternoon
Common causes
In rough order of frequency:
1. Conversation got long — full history is re-processed each turn
ChatGPT is stateless: the server doesn’t “remember” prior turns. Each new turn stitches the entire conversation into one input. After 50 round trips, input may be tens of thousands of tokens, and prefill time grows accordingly. GPT-5.5 responds instantly with 4k input but takes 10+ seconds of prefill with 100k input.
How to verify: open a new chat, ask the same question. If it’s instant there, history was the drag.
2. Using a “heavy model” for a “light task”
Simple work (translation, titling, typo fixes) on a reasoning model (GPT-5 / o3) — the model runs internal “thinking” before answering, making it 5–10× slower than GPT-5.4.
| Model | Speed | Fits |
|---|---|---|
| GPT-5.4 | Fastest | Daily chat, translation, typo fix |
| GPT-5.5 | Fast | Writing, analysis, light code |
| GPT-5 | Medium | Long tasks, complex analysis |
| o1 / o3 reasoning | Slow (silent thinking phase) | Math, reasoning, complex code |
| GPT-5.5 image | Slow | Image generation |
How to verify: what’s currently in the model selector? If o1 / o3, switch to GPT-5.5 for a speed comparison.
3. Network RTT / far VPN exit
OpenAI mostly serves from US East / US West. From Asia / Europe, baseline RTT is 100–200ms higher than US users. Add a poorly chosen VPN node (Singapore → Frankfurt → US West) and you may add another 500ms. The stream looks like “one word at a time.”
How to verify: DevTools → Network → look at TTFB (Time to First Byte). > 1s means network is the issue.
4. North America business-hour server queue
US Eastern weekdays 9am–5pm (China time 9pm – next-day early morning) is OpenAI’s peak. Free and lower-tier accounts get deprioritized and queued.
How to verify: try the same prompt off-peak (US evenings / weekends). Noticeably faster = queue.
5. Conversation contains many files / code / tables
Past PDF uploads or large code paste blocks all live in context — every turn reprocesses them.
How to verify: new chat with no attachments + a simple prompt — instant response = attachments were the drag.
6. Browser extension slowing rendering
Some privacy extensions, ad blockers, and AI overlays (Monica / Glasp, etc.) add per-message listeners, making rendering laggy as message count grows.
How to verify: compare speed in an incognito window (extensions disabled).
Shortest path to fix
Ordered by impact. The first two usually fix it.
Step 1: Start a new chat
Simplest and most effective. Cmd/Ctrl + N (or “New chat” top-left). The same prompt in a clean chat typically runs 3–10× faster.
If you need prior context, hand-pick 3–5 key messages and paste them as the first message of the new chat.
Step 2: Switch to a task-appropriate model
Model selector above the chat box:
| What you’re doing | Switch to |
|---|---|
| Chat / translation / typo fix / titles | GPT-5.4 |
| Writing / analysis / light code / summary | GPT-5.5 |
| Complex reasoning / math / long code | GPT-5 / o1 (accept slow) |
| Image generation | GPT-5.5 image |
Save reasoning models for the tasks that actually need them.
Step 3: Switch network / VPN node
On desktop, DevTools → Network → check TTFB.
- TTFB > 1s → network layer issue
- Compare with phone 4G hotspot
- VPN users: try a node geographically near OpenAI’s region (US East / West)
Step 4: Disable extensions / use incognito
Incognito (extensions off by default) + manually disable any non-essential extensions.
Step 5: Avoid peak hours
If your work hours overlap US business hours, run heavy tasks earlier in your morning. Or upgrade to Pro / Enterprise — they get priority during peaks.
Step 6: Split the task
Very long tasks (500+ lines of code, 10k-word translations) — split into chunks. Have the model process 3–5k words / 100 lines at a time. Noticeably faster than one giant request.
Easy to misdiagnose as
Slow is not the same as stuck. If a request truly hangs and never produces output (“Thinking” 60+ seconds with zero tokens), see ChatGPT stuck on loading.
Slow-but-progressing usually starts producing tokens within 5–30 seconds.
Prevention
- Fresh chat per topic — saves context and saves time
- Pick the model that matches the task (default for chat, thinking only for hard tasks)
- Don’t stack VPN + region-mismatched proxy — every hop adds latency
- Avoid US business hours for heavy work, or upgrade to Pro
- Split long tasks into chunks — shorter prefill per call
- After completing a workflow, start a new chat — don’t pile 100+ messages into one
Related
- ChatGPT stuck on loading
- ChatGPT cannot open
- ChatGPT vs Claude vs Gemini
- ChatGPT beginner guide
- ChatGPT prompt improvement
- ChatGPT model selection guide
Tags: #ChatGPT #Debug #Troubleshooting