You ask Claude Code to “refactor the auth flow across the app”, it starts reading every file in src/auth/ plus src/api/, dumps three large config files into context, then halfway through the actual refactor it hits the context limit and auto-summarizes — losing the plan you just spent 40K tokens building. The next message proposes something that contradicts step 3 of the original plan. The root problem: you handed Claude one task whose total information budget exceeds the model’s 200K-token context window, so summarization is inevitable. The fix is to decompose the task so each sub-task fits comfortably in 60-80K tokens, with persistent state on disk between them.
Common causes
1. Task spans too many files for one context window
Refactoring 30 files at once means reading 30 files, holding their content while writing changes, then writing back. At ~3-5K tokens per file, you are at 100-150K before any reasoning happens. The model has 50K left for thinking, which is not enough for a non-trivial refactor.
How to spot it: The session opens with a long Read / Grep sequence; context usage indicator jumps past 50% before any Edit is made.
2. CLAUDE.md or AGENTS.md is huge
A 5000-line CLAUDE.md that gets loaded into every session burns 15-25K tokens before the first user message. Useful tasks then have less headroom than they need.
How to spot it: Brand-new session shows >10% context used before you have done anything. Check wc -l CLAUDE.md and any nested AGENTS.md files.
3. Search results are too broad
Grep -r "auth" across a large repo returns thousands of matches; the model loads all of it into context to “be thorough”. Most of those matches are irrelevant.
How to spot it: After a Grep, context jumps by 20K+ tokens. Look at the Grep output — if it has hundreds of lines, it was too broad.
4. Long tool error traces eat the budget
A failing Bash command that prints 200 lines of stack trace per attempt, retried 4 times, costs 30-40K tokens by itself, all noise.
How to spot it: The session has multiple failed tool calls and the context indicator climbs after each failure.
5. Plan mode produced an oversized plan
A “comprehensive plan” that lists every file and every step inflates to 8-12K tokens. Combined with the supporting reads done to validate the plan, the model is at 50% context before code generation starts.
How to spot it: Plan mode artifact is over 200 lines; the session started executing with significant context already used.
6. Long-running session that should have been restarted
A 4-hour session that never restarted accumulates plan + decisions + tool calls + errors + retries. Even with no single huge action, the cumulative load fills the window.
How to spot it: Session has been running for hours; recent messages show summarization indicators or context > 80%.
Before you start
- Check current context usage in the Claude Code status indicator before starting a new task.
- Estimate the task scope: how many files, how many tools, how much output. If any number feels large, decompose first.
- Identify which files genuinely need to be read in full vs which can be sampled or grepped.
Information to collect
- Current CLAUDE.md and AGENTS.md sizes (
wc -l). - Approximate session length and recent actions taken.
- The exact task description and any constraints already given.
- A rough count of files that touch the task area.
- Whether sub-agents have been used yet in the session.
- Whether progress files (
.agent-progress.mdor equivalent) exist already.
Step-by-step fix
Step 1: Decompose the task into fits-in-one-context chunks
Rewrite the task as 3-5 ordered sub-tasks, each fitting in roughly 60-80K tokens of work:
Old (one task, too big):
"Refactor the auth flow across the app"
New (four sub-tasks):
1. Read current auth module, write a refactor plan to .auth-refactor-plan.md
2. Refactor src/auth/login.ts and src/auth/session.ts (commit)
3. Update consumers in src/api/* to use the new interface (commit)
4. Update tests and run them (commit)
Each step is committable independently and starts a fresh context.
Step 2: Use sub-agents for read-heavy investigations
For “go figure out how X works”, spawn a sub-agent via the Task tool. The sub-agent reads 30 files, returns a 500-token summary, and your main context only sees the summary:
Spawn sub-agent:
"Read all files in src/auth/ and src/api/auth* and produce a
markdown report covering: current public interfaces, callers,
session storage, error handling patterns. Return the report only."
Main session keeps a clean context; sub-agent’s reads happen in its own window.
Step 3: Replace whole-file reads with targeted reads
Instead of Read src/api/users.ts (500 lines, 4K tokens), use:
Grep "session" src/api/users.ts → 12 matches, 200 tokens
Read src/api/users.ts:120-150 → 30 lines, 250 tokens
For most edits you need a small window of the file, not the whole file.
Step 4: Slim down CLAUDE.md to load-bearing content
CLAUDE.md should be under 500 lines (~2K tokens). Move stale or rarely-relevant content to scoped files that only get loaded when needed:
CLAUDE.md → keep: architecture decisions, conventions, must-not-violate rules
apps/web/AGENTS.md → web-specific details (only loaded when working in apps/web/)
docs/historical-decisions.md → archived rationale (read only when relevant)
Step 5: Save plans and state to disk between steps
Write the plan to a file, commit changes after each step, then start a fresh context for the next step using only the plan file as input:
# Step 1 of the refactor:
echo "## Refactor plan" > .auth-refactor-plan.md
# Claude writes the plan into this file
# Step 2 (new session):
# Open new session, paste: "Read .auth-refactor-plan.md and execute step 2 only"
Each step starts at <10% context instead of 60%.
Step 6: Restart at clean boundaries
When the context indicator crosses 60%, finish the current sub-task and start a new session — do not push through to 95% and lose work to a forced summarization.
Wrap up signal: at 60% used, finish current step, commit, restart
Emergency signal: at 80% used, save what you have and restart
Restarting at a clean boundary is always cheaper than recovering from a bad summarization.
Verify
- The current task fits the context budget — usage stays under 70% throughout execution.
- No auto-summarization happens mid-task.
- Each sub-task can be re-run independently from the saved plan if a context reset occurs.
- The final result is consistent with the original plan, not a drift caused by lossy memory.
- Commits land in logical increments matching the sub-task structure.
Long-term prevention
- Treat 60% context use as a wrap-up trigger, not a “keep going” signal.
- Keep CLAUDE.md under 500 lines; promote rarely-used context to scoped files.
- Default to Grep + targeted Read instead of full-file Read.
- Pre-decompose any task that touches >10 files into sub-tasks before starting.
- Use sub-agents for any investigation that requires reading more than 5 files.
- Save plans and progress to disk in every task >20 minutes so context resets do not lose work.
- After each multi-file refactor, audit which files were truly needed in context vs which were read defensively.
Common pitfalls
- Pasting a 2000-line file into the chat “for reference” — that one paste eats 8K tokens of headroom.
- Re-Reading the same file three times in one session because the agent forgot it already had it.
- Running
Bashcommands that produce massive stdout (find /,tree,cat large-log.txt) — pipe throughheador save to a file instead. - Ignoring context-usage indicators until they cross 90% — by then summarization is imminent and the plan you depend on may be in the eviction queue.
- Splitting a task into “sub-tasks” that are still huge (3 sub-tasks each at 100K) — sub-tasks must actually fit.
- Treating sub-agents as expensive — they are cheap relative to a forced summary that ruins the plan.
FAQ
Q: What is the actual context limit in Claude Code? A: The underlying model has a 200K-token context window (Opus 4.7 / Sonnet 4.6). Claude Code reserves some of that for system messages and tool definitions, so effective working budget is around 180K. Auto-summarization typically kicks in around 90% of the working budget.
Q: Does summarization always lose information? A: Yes — by design. Summaries compress, and compression is lossy. Some specifics will always be dropped. The fix is to put load-bearing specifics in CLAUDE.md or a progress file so they survive summarization.
Q: How do I check current context usage? A: Claude Code shows it in the status bar / footer depending on version. If yours does not, watch for “compacting context” or “summarizing earlier conversation” indicators as a proxy.
Q: Should I use Opus or Sonnet for large tasks? A: Both have 200K windows. Opus reasons better on complex multi-step plans; Sonnet is faster and cheaper for execution. For very large refactors, plan with Opus and execute with Sonnet sub-agents.
Q: Can I increase the context window? A: No — the model defines the window. Decomposition is the only way to handle work that exceeds it.