You have been chatting for an hour, the thread is 80 turns deep, and ChatGPT suddenly starts contradicting itself or asking “what file?” about a PDF you uploaded ten minutes ago. The replies are coherent but they reference the wrong details, or skip over instructions you gave at the top of the chat. The model has not gone stupid. The conversation has simply outgrown the context window, and ChatGPT is silently dropping the oldest turns to keep the request fitting under the token cap. The fix is either to compact the thread, prune attachments, or start fresh with a curated summary.
Common causes
Ordered by hit rate, highest first.
1. Total tokens exceeded the model’s context window
GPT-5 has a generous window but it is not unlimited. Long chats with large attachments hit the cap, and the UI then trims earliest turns first. There is no banner warning, just degraded answers.
How to judge: Ask “what did I say in my first message of this conversation?” If the model paraphrases something more recent or guesses, the early turns have been trimmed.
2. Large attachments consuming most of the budget
Each PDF or image you upload is converted to tokens and re-sent every turn. A 50-page PDF can eat 30k+ tokens, leaving little room for the live conversation.
How to judge: Count attachments in the thread. More than two large files plus a long chat almost always means context pressure.
3. Custom Instructions and system prompt taking a heavy slice
Long Custom Instructions, plus any Project system prompt, plus Memory entries all get prepended every turn. A bloated About You field can quietly cost 2-3k tokens per request.
How to judge: Settings, then Personalization, then Custom Instructions. If either box is more than 500 words, it is contributing.
4. Memory is overwriting context, not extending it
ChatGPT Memory stores short facts across chats, but inside a single chat it does not magically expand the window. People sometimes assume Memory means infinite context. It does not.
How to judge: Turn off Memory for one chat (toggle in Personalization) and see if behavior is the same. If yes, Memory was never the cause.
5. Tool calls (web search, code interpreter) adding hidden tokens
Each tool result is appended to context. Long web search results or large code execution outputs can quietly fill the window.
How to judge: Count tool calls in the thread. Many web search results, or one big DataFrame from code interpreter, will dominate the budget.
6. Model auto-downgrade on Plus plans
When usage caps kick in, Plus accounts can be silently routed to a smaller-window model. That model has less context to work with, so the same chat starts dropping content earlier.
How to judge: Check the model picker at top. If it shows a fallback name like “GPT-5 mini” instead of the model you started with, that is the cause.
Before you start
- Decide whether the chat is salvageable, or whether starting a fresh thread with a summary is faster.
- Save any unique answers from the current chat as plain text before pruning, in case trimming loses them.
- Note which model you started with so you can compare after a fresh chat.
Information to collect
- Approximate turn count and attachment count in the thread.
- Model name shown in the picker right now.
- Whether Memory is on and how many entries it holds.
- Length of Custom Instructions (paste into a word counter).
- Whether the chat is inside a Project (Projects add system prompts).
- Plan tier (Free, Plus, Team, Enterprise) and any recent usage cap hits.
Step-by-step fix
Step 1: Ask the model to summarize the conversation so far
In the same thread, send:
Summarize everything important from this conversation in a numbered
list: decisions made, facts established, files referenced, and any
open questions. Keep it under 300 words.
Copy that summary. You now have a portable seed for a fresh thread.
Step 2: Start a new chat seeded with the summary
Open a new conversation, paste the summary as your first message with a header like “Context from previous session”, then continue your task. Fresh window, same continuity.
Step 3: Prune attachments before re-uploading
If you must keep working with files, only re-upload the specific pages or sheets you actually need. Use a PDF tool to extract the relevant 5 pages instead of the whole 80-page report.
Step 4: Trim Custom Instructions and Memory
Settings, then Personalization. Cut Custom Instructions to under 300 words total. Delete stale Memory entries (Manage Memory) that no longer apply. Both savings compound across every future chat.
Step 5: Use a Project for long-running work
For multi-session work, create a Project, attach your reference files there, and chat inside the Project. Each chat in a Project starts fresh but inherits the Project files and Project instructions cleanly.
Step 6: Switch to a model with a larger window for the rerun
In the model picker pick GPT-5 (full) rather than mini, or use a long-context variant if your plan offers one. Re-run the prompt that just failed and compare quality.
Step 7: Break complex tasks into shorter chats
Instead of one mega-thread, run one chat per phase (research, outline, draft, edit). Each chat stays well under the cap and you get sharper answers.
Verify
- In the new chat, ask the model to repeat back the seed summary’s first three bullets. If it can, context is healthy.
- Re-ask the question that was failing. If you now get a coherent answer, the window was the cause.
- After trimming Custom Instructions, start one more new chat to confirm baseline behavior improved.
Long-term prevention
- Treat 50 turns or three large attachments as your soft ceiling. Beyond that, summarize and restart.
- Keep Custom Instructions tight, two short paragraphs maximum.
- Audit Memory monthly and prune stale entries.
- For research, use Projects and attach the source files once, instead of dragging them into every chat.
- For code work, keep one chat per feature, not one chat per week.
Common pitfalls
- Assuming Memory extends context within a chat. It does not.
- Re-uploading the same PDF in every new message. Each upload is fresh tokens.
- Pasting huge log dumps inline. Trim to the relevant 50 lines.
- Ignoring the model picker showing a fallback variant.
- Trying to “remind” the model of early turns by re-pasting them, which wastes more tokens.
FAQ
- How big is the context window on GPT-5? Large, but not infinite, and the exact number is not published in the UI. Treat 50 long turns plus attachments as your practical limit.
- Does Memory make the window bigger? No. Memory is a small store of cross-chat facts. It does not expand single-chat context.
- Will Projects give me more context? Projects let attachments and instructions live outside the chat, so the chat itself spends fewer tokens on them. That is effectively more usable room.
- Why does my chat get worse but no error appears? ChatGPT trims silently when over capacity. There is no banner. Degraded answers are the only signal.
- Is GPT-5 mini worse for long chats? Yes, mini has a smaller window. If the picker shows mini after a cap, expect earlier degradation.
- Can I see the token count anywhere? Not in ChatGPT.com. Only the API exposes token usage per request.