How big is the context window in ChatGPT?

As of June 2026, in-app GPT-5.5 is ~16K tokens on Free, 32K on Go/Plus/Business, and 128K on Pro/Enterprise. The 1,000,000-token window applies to the API and the $200 Pro in-app context, not a standard Plus chat.

Why does it say 1M tokens everywhere but my chat still forgets?

That 1M figure is the API limit. ChatGPT.com caps the in-app window much lower per tier, so a Plus thread runs out of room around 32K tokens regardless of the API number.

Does Memory make the window bigger?

No. Memory is a small store of cross-chat facts (now `Saved memories` plus `Reference chat history`). It injects facts into the request and slightly *reduces* single-chat room rather than expanding it.

Will Projects give me more context?

Projects let attachments and instructions live outside the chat, so the live thread spends fewer tokens on them. That is effectively more usable room, though the hard window per tier is unchanged.

Why does my chat get worse but no error appears?

ChatGPT trims the oldest turns silently when over capacity. There is no banner. Degraded answers are the only signal.

Can I see the token count anywhere?

Not in ChatGPT.com. Only the API exposes token usage per request. Approximate it at ~0.75 words per token.

Troubleshooting

ChatGPT Context Window Exceeded in Long Conversations

Long ChatGPT thread starts forgetting earlier turns. Usually the model's context window filled up, attachments are eating tokens, or you got routed to a smaller-window tier. Diagnose and fix, with exact June 2026 token limits.

Published: May 24, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You have been chatting for an hour, the thread is 80 turns deep, and ChatGPT suddenly starts contradicting itself or asking “what file?” about a PDF you uploaded ten minutes ago. The replies are coherent but they reference the wrong details, or skip over instructions you gave at the top of the chat. The model has not gone stupid. The conversation has simply outgrown the context window for your tier, and ChatGPT silently drops the oldest turns to keep the request fitting under the token cap. There is no banner and no error. Degraded answers are the only signal.

Fastest fix: in the same thread, ask the model to write a 300-word numbered summary of decisions, facts, files, and open questions, then paste that summary into a brand-new chat and continue there. A fresh window with a curated seed restores continuity immediately. The rest of this guide is for diagnosing why you hit the wall so it stops happening.

The number that actually matters (June 2026)

The in-app context window is much smaller than the headline “1M tokens” you see in marketing, because that 1M figure is the API limit. What you get inside ChatGPT.com depends on your plan. As of June 2026, for the default GPT-5.5 model:

Plan	In-app context window (GPT-5.5)
Free	~16K tokens
Go / Plus / Business	32K tokens
Pro / Enterprise	128K tokens
API (any tier)	1,000,000 tokens

So a Plus user has roughly 32K tokens (~24,000 words, or ~45-50 pages) of working room for the entire thread: your messages, the model’s replies, Custom Instructions, Memory, Project files, and every attachment combined. That fills faster than people expect. The full 1M-token window only exists in the API and in OpenAI’s $200 Pro tier in-app context; it is not what a $20 Plus chat gets.

Common causes

Ordered by hit rate, highest first.

1. Total tokens exceeded your tier’s context window

GPT-5.5 in the API has a 1M window, but inside ChatGPT.com you are capped at the tier number in the table above (32K on Plus, 128K on Pro). Long chats with large attachments hit that cap, and the UI then trims the earliest turns first. There is no banner warning.

How to judge: Ask “what did I say in my first message of this conversation?” If the model paraphrases something more recent or guesses, the early turns have been trimmed.

2. Large attachments consuming most of the budget

Each PDF or image you upload is converted to tokens and carried in context. A 50-page PDF can eat 30k+ tokens by itself, which is more than the entire 32K Plus window in one file, leaving almost no room for the live conversation.

How to judge: Count attachments in the thread. On Plus, even one big file plus a long chat almost always means context pressure; on Pro, more than two or three.

3. Custom Instructions and system prompt taking a heavy slice

Long Custom Instructions, plus any Project system prompt, plus Memory entries all get prepended every turn. A bloated “What traits should ChatGPT have?” / “Anything else?” field can quietly cost 2-3k tokens per request, and on a 32K window that is a real bite.

How to judge: Go to Settings -> Personalization -> Custom instructions. If either box is more than ~500 words, it is contributing.

4. Memory is overwriting context, not extending it

ChatGPT Memory stores short facts across chats, but inside a single chat it does not expand the window. After the June 2026 “Dreaming V3” memory rollout there are two layers (Saved memories and Reference chat history), and people sometimes assume the new memory means infinite context. It does not. Memory injects a few facts at the start of the request, which actually consumes a little of your window rather than adding to it.

How to judge: Turn off Memory for one chat (the toggle is in Settings -> Personalization -> Memory) and see if behavior is the same. If yes, Memory was never the cause.

5. Tool calls (web search, code interpreter) adding hidden tokens

Each tool result is appended to context. Long web search results or large code-execution outputs can quietly fill the window.

How to judge: Count tool calls in the thread. Many web-search results, or one big DataFrame from code interpreter, will dominate the budget.

6. Reasoning mode or tier routing changed your effective window

The model picker no longer shows “mini” by name. As of June 2026 it shows reasoning modes (Instant, Thinking, Pro) on the GPT-5.5 family. When you hit a usage cap, paid accounts can be temporarily routed to a lighter model (GPT-5.4 / GPT-5.5 Instant), and Pro mode disables Memory and Apps to keep the reasoning chain isolated. Each of these changes how much context the same chat actually uses.

How to judge: Check the picker at top-left. If it shows a fallback model name or a different mode than you started with, that changes your effective room.

Before you start

Decide whether the chat is salvageable, or whether starting a fresh thread with a summary is faster. Past ~30 turns on Plus, fresh-with-summary is usually faster.
Save any unique answers from the current chat as plain text before pruning, in case trimming loses them.
Note which model and mode you started with so you can compare after a fresh chat.

Information to collect

Approximate turn count and attachment count in the thread.
Model name and reasoning mode shown in the picker right now.
Whether Memory is on and how many Saved memories entries it holds.
Length of Custom Instructions (paste into a word counter).
Whether the chat is inside a Project (Projects add system prompts and shared files).
Plan tier (Free, Go, Plus, Pro, Business, Enterprise) and any recent usage-cap hits.

Step-by-step fix

Step 1: Ask the model to summarize the conversation so far

In the same thread, send:

Summarize everything important from this conversation in a numbered
list: decisions made, facts established, files referenced, and any
open questions. Keep it under 300 words.

Copy that summary. You now have a portable seed for a fresh thread. (Do this before the thread degrades further; once early turns are trimmed, the model cannot summarize what it no longer sees.)

Step 2: Start a new chat seeded with the summary

Open a new conversation, paste the summary as your first message with a header like “Context from previous session”, then continue your task. Fresh window, same continuity.

Step 3: Prune attachments before re-uploading

If you must keep working with files, only re-upload the specific pages or sheets you actually need. Use a PDF tool to extract the relevant 5 pages instead of the whole 80-page report. On a 32K Plus window this single change often doubles your usable room.

Step 4: Trim Custom Instructions and Memory

Go to Settings -> Personalization (or jump straight to chatgpt.com/#settings/Personalization). Cut Custom Instructions to under ~300 words total. Open Manage memories and delete stale Saved memories entries that no longer apply; use Clear all when you start a genuinely new project. Both savings compound across every future chat.

Step 5: Use a Project for long-running work

For multi-session work, create a Project, attach your reference files there, and chat inside the Project. Each chat in a Project starts fresh but inherits the Project files and Project instructions cleanly, so the live thread spends fewer tokens carrying them.

Step 6: Switch to a larger window for the rerun

If you are on Plus and routinely hitting the 32K wall on document-heavy work, the only real fix for window size is the tier, not the model picker: Pro (128K in-app) gives 4x the room, and the full 1M-token context exists only on the $200 Pro tier in-app or via the API. Within your current plan, choose Thinking over Instant for harder tasks and confirm the picker has not silently routed you to a fallback model after a cap. Re-run the prompt that just failed and compare quality.

Step 7: Break complex tasks into shorter chats

Instead of one mega-thread, run one chat per phase (research, outline, draft, edit). Each chat stays well under the cap and you get sharper answers.

How to confirm it’s fixed

In the new chat, ask the model to repeat back the seed summary’s first three bullets. If it can, context is healthy.
Re-ask the question that was failing. If you now get a coherent answer, the window was the cause.
After trimming Custom Instructions, start one more new chat to confirm baseline behavior improved.
If you upgraded to Pro for the room, re-run your largest realistic thread; if it holds 100+ turns without forgetting the opening, the tier was the bottleneck.

Long-term prevention

Know your number: ~32K tokens on Plus, ~128K on Pro. Treat 50 turns or one large attachment (Plus) / three (Pro) as your soft ceiling, then summarize and restart.
Keep Custom Instructions tight, two short paragraphs maximum.
Audit Memory monthly and prune stale Saved memories entries.
For research, use Projects and attach the source files once, instead of dragging them into every chat.
For code work, keep one chat per feature, not one chat per week.

Common pitfalls

Assuming the “1M context” headline applies in-app. It is the API number; Plus is 32K, Pro is 128K.
Assuming Memory extends context within a chat. It does not, and it costs a little of your window.
Re-uploading the same PDF in every new message. Each upload is fresh tokens.
Pasting huge log dumps inline. Trim to the relevant 50 lines.
Ignoring the model picker showing a fallback model or a different reasoning mode after a cap.
Trying to “remind” the model of early turns by re-pasting them, which wastes more tokens.

FAQ

How big is the context window in ChatGPT? As of June 2026, in-app GPT-5.5 is ~16K tokens on Free, 32K on Go/Plus/Business, and 128K on Pro/Enterprise. The 1,000,000-token window applies to the API and the $200 Pro in-app context, not a standard Plus chat.
Why does it say 1M tokens everywhere but my chat still forgets? That 1M figure is the API limit. ChatGPT.com caps the in-app window much lower per tier, so a Plus thread runs out of room around 32K tokens regardless of the API number.
Does Memory make the window bigger? No. Memory is a small store of cross-chat facts (now Saved memories plus Reference chat history). It injects facts into the request and slightly reduces single-chat room rather than expanding it.
Will Projects give me more context? Projects let attachments and instructions live outside the chat, so the live thread spends fewer tokens on them. That is effectively more usable room, though the hard window per tier is unchanged.
Why does my chat get worse but no error appears? ChatGPT trims the oldest turns silently when over capacity. There is no banner. Degraded answers are the only signal.
Can I see the token count anywhere? Not in ChatGPT.com. Only the API exposes token usage per request. Approximate it at ~0.75 words per token.

Tags: #ChatGPT #Troubleshooting #memory #gpt-5