ChatGPT Got Slow: What to Check First (June 2026)

ChatGPT replies crawling? It is almost always a long conversation, a heavy model, or network RTT — in that order. Start a new chat first; it fixes most cases.

Published: May 17, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Fastest fix: start a new chat (Cmd/Ctrl + Shift + O, or “New chat” top-left). A clean thread typically runs the same prompt 3–10x faster. If that does not do it, switch the model picker to GPT-5.5 Instant and check your network. The full order of likelihood is: long conversation → heavy model → network latency → server queue.

First, separate two things people lump together. ChatGPT being slow and ChatGPT being stuck are not the same. Slow = tokens drip out but the reply eventually finishes. Stuck = the request fires and nothing comes back for 30+ seconds. This article is about slow. If yours truly hangs with zero output, see ChatGPT stuck on loading instead.

Why slow happens mechanically: every turn, the server concatenates your full history plus the new prompt and sends it all to the model. The model runs prefill (process the input) then decode (generate the output). Longer input means longer prefill; a heavier model is slower per token; higher network RTT makes the stream look stalled. Separately, on the web the browser tab has to re-render the entire growing thread, which lags independently of the model.

Symptoms

Tokens trickle out one at a time
The “Thinking” indicator hangs 20+ seconds before any output starts
Earlier turns in the same thread were fast; it got slower as the chat grew
The web tab feels heavy: scrolling stutters, typing lags before you even send
App slower than web, or vice versa
Same prompt is fast in the morning, slow in the afternoon

Which bucket are you in?

What you observe	Most likely cause	Jump to
Long thread, page also feels laggy to scroll/type	Browser re-rendering + big context	Step 1
Picker shows Thinking or Pro; long silent pause then output	Heavy model on a light task	Step 2
First token takes 1s+; you are on VPN or outside the US	Network RTT	Step 3
Only slow on weekday afternoons; you are on Free	Peak-hour queue	Step 5
Slow only in one browser, fine in incognito	Extension overhead	Step 4

Common causes (in order of frequency)

1. The conversation got long

Two separate things slow down a long chat, and on the web both hit at once:

Model side: ChatGPT is stateless. The server does not “remember” prior turns; each new turn stitches the entire conversation into one input. After 50 round trips, input can be tens of thousands of tokens, and prefill time grows with it. GPT-5.5 Instant answers near-instantly at ~4k tokens of input but adds several seconds of prefill at ~100k tokens.
Browser side: ChatGPT’s web frontend does not virtualize the message list — every message stays mounted in the DOM, so the tab keeps the whole thread in memory and re-lays-out the entire page on each new answer. Threads past ~150–200 messages routinely push the tab over 800 MB–1 GB of RAM, at which point typing and scrolling lag even before the model responds.

How to verify: open a new chat and ask the same question. Near-instant there means history was the drag. On desktop, open your browser task manager (Chrome: Shift + Esc) — if the ChatGPT tab is over ~1 GB, that is your signal to start fresh.

2. A heavy model on a light task

ChatGPT’s picker as of June 2026 has three manual options plus an automatic router. Picking Thinking or Pro for simple work (translation, titling, typo fixes) makes the model run a silent internal reasoning pass before it answers — easily 5–10x slower than Instant.

Picker option	Speed	Best for
GPT-5.5 Instant (default)	Fastest, no reasoning pause	Daily chat, translation, typo fixes, summaries, light code
GPT-5.5 Thinking	Slower (silent reasoning first)	Hard math, multi-step analysis, complex code
GPT-5.5 Pro	Slowest (heavy reasoning)	The hardest research/code; Pro, Business, Enterprise, Edu plans only
Auto (router)	Varies	Routes between Instant and Thinking by complexity

Image generation also adds a noticeable wait — it is a separate render step, not a text stream, so expect 10–30 seconds regardless of picker.

How to verify: look at the model selector above the chat box. If it shows Thinking or Pro, switch to GPT-5.5 Instant and compare.

3. Network RTT or a far VPN exit

OpenAI mostly serves from US East / US West. From Asia or Europe, baseline RTT is 100–200ms higher than for US users. Add a poorly chosen VPN path (Singapore -> Frankfurt -> US West) and you can stack another 500ms. The stream then looks like “one word at a time” even when the model itself is fast.

How to verify: DevTools -> Network -> watch TTFB (Time to First Byte) on the conversation request. A TTFB above 1s points at the network, not the model.

4. A browser extension slowing rendering

Privacy extensions, ad blockers, and AI overlays (Monica, Glasp, and similar) add per-message listeners that compound as the message count grows, making the page render laggy independent of the model.

How to verify: open the same chat in an incognito window (extensions are off there by default) and compare.

5. North-America business-hour queue

Demand peaks on US Eastern weekdays, heaviest roughly 12pm–5pm ET (which is late evening to early morning in China). Free accounts get deprioritized during these windows; Plus and Pro keep priority access and rarely see slowdowns.

How to verify: run the same prompt off-peak (US evenings or weekends). Noticeably faster means you were queued.

6. The conversation is full of files, code, or tables

Past PDF uploads or large pasted code blocks all live in the context window — every turn reprocesses them, on both the model side and the browser side.

How to verify: new chat, no attachments, simple prompt. Instant response means the attachments were the weight.

Shortest path to fix

Ordered by impact. The first two solve most cases.

Step 1: Start a new chat

Simplest and most effective. Cmd/Ctrl + Shift + O, or “New chat” top-left. The same prompt in a clean chat typically runs 3–10x faster, and it resets the browser tab’s memory at the same time.

If you need the prior context, ask the old chat “Summarize our conversation so far in about 300 words, including any decisions and open items,” then paste that summary as the first message of the new chat. You keep the thread of thought without dragging tens of thousands of tokens along.

Step 2: Switch to a task-appropriate model

Open the model selector above the chat box:

What you are doing	Switch to
Chat, translation, typo fixes, titles, summaries	GPT-5.5 Instant
Writing, light code, everyday analysis	GPT-5.5 Instant (or Auto)
Hard math, multi-step reasoning, complex code	GPT-5.5 Thinking (accept the slower pass)
The hardest research or code (Pro/Business/Enterprise)	GPT-5.5 Pro

Save Thinking and Pro for tasks that actually need them.

Step 3: Switch network or VPN node

On desktop, open DevTools -> Network and check TTFB on the chat request.

TTFB above 1s -> the network layer is the bottleneck
Compare against a phone on a cellular hotspot
VPN users: pick a node geographically near OpenAI’s region (US East or West); do not chain proxies

Step 4: Disable extensions or use incognito

Open an incognito window (extensions off by default), or manually disable non-essential extensions and AI overlays, then reload.

Step 5: Avoid peak hours, or upgrade

If your work hours overlap US weekday afternoons (12pm–5pm ET), run heavy tasks earlier in your day. On Free, upgrading to Plus ($20/mo as of June 2026) or Pro restores priority access during peaks.

Step 6: Reset the browser tab and free RAM

If only the web feels heavy: close and reopen the ChatGPT tab to clear accumulated memory, and turn on your browser’s tab suspender (Chrome: chrome://settings/performance -> Memory Saver). This alone can drop a long tab’s RAM by half.

Step 7: Split very long tasks

For very large jobs (500+ lines of code, 10k-word translations), split into chunks of ~3–5k words or ~100 lines per request. Shorter prefill per call beats one giant request that re-processes everything.

How to confirm it is fixed

In a new chat, send a short prompt (for example “Reply with the word OK”). First token should appear within ~1–2 seconds.
If still slow, check TTFB in DevTools -> Network. Under 1s means the network is fine and the model/queue is the variable.
Confirm the picker shows GPT-5.5 Instant for everyday work.
On the web, check the browser task manager: a fresh ChatGPT tab should sit well under 1 GB.

If the request never starts producing tokens at all (60+ seconds of “Thinking” with nothing), that is stuck, not slow — see ChatGPT stuck on loading.

FAQ

Why does ChatGPT slow down the longer a chat gets? Two reasons compound. The model reprocesses the entire conversation as input every turn (longer input = longer prefill), and the web tab keeps every message in the DOM and re-renders the whole thread, ballooning RAM. Starting a new chat resets both.

Is GPT-5.5 Thinking always slower than Instant? Yes for the first token. Thinking runs a silent internal reasoning pass before it writes anything, so simple prompts feel much slower. Use Instant (the default) for everyday work and switch to Thinking only for hard reasoning, math, or complex code.

Does upgrading to Plus or Pro make responses faster? It mainly removes peak-hour queuing — Plus and Pro keep priority access when Free users get deprioritized on weekday afternoons. It does not shorten prefill on a giant conversation; a fresh chat does that.

My replies are fine but the page itself lags. Why? That is the browser, not the model. ChatGPT’s web frontend does not virtualize the message list, so a thread past ~150–200 messages can push the tab over 1 GB of RAM. Start a new chat or reopen the tab.

How do I keep context when I start a new chat? Ask the old chat to summarize the conversation in ~300 words including decisions and open items, then paste that as the first message of the new chat.

Prevention

Start a fresh chat per topic — it saves context size and keeps the tab light
Match the model to the task: Instant for everyday work, Thinking only for genuinely hard problems
Do not stack VPN plus a region-mismatched proxy — every hop adds latency
Schedule heavy work outside US weekday afternoons, or upgrade for priority access
Split long tasks into chunks so each call has a shorter prefill
After finishing a workflow, open a new chat instead of piling 100+ messages into one

Tags: #ChatGPT #Debug #Troubleshooting