Claude Code Token Budget Too Large for One Task

Q: How do I check current context usage?

Run `/context` for a detailed breakdown, or read the status line / footer. Watch for a "Compacting conversation" indicator as a sign you crossed the threshold.

One task burns through the context window, auto-compacts mid-refactor, and loses your plan. Decompose into smaller steps, push reads into sub-agents, and save state to disk.

Published: May 23, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Fastest fix: the task is too big to keep its whole working set in one window, so Claude Code auto-compacts (summarizes earlier history) and loses your plan. Split the work into 3-5 ordered sub-tasks, push read-heavy investigation into a sub-agent (the Task tool, which runs in its own context window), save the plan to a file, and start a fresh session for each step. Watch /context and wrap up before compaction fires.

You ask Claude Code to “refactor the auth flow across the app”. It reads every file in src/auth/ plus src/api/, dumps three large config files into context, then halfway through the actual refactor it auto-compacts — replacing your earlier turns with a summary and losing the plan you just spent 40K tokens building. The next message proposes something that contradicts step 3 of the original plan.

The root problem is not the size of the window. As of June 2026, Claude Code defaults to Opus 4.7 with a 1M-token context window (generally available since March 13, 2026, at standard per-token rates — the old 2x long-context premium above 200K was removed). The problem is that a single task whose total working set keeps growing will still eventually cross the auto-compaction threshold (~83.5% of the window by default), and compaction is lossy. The fix is to decompose the task so each step’s working set stays well under that line, with persistent state on disk between steps.

Which bucket are you in?

Run /context first — it prints a breakdown of what is filling the window (messages, system prompt, MCP tools, skills, files, and the autocompact buffer reservation). That tells you which row below applies.

Symptom	Likely cause	Section
Long Read/Grep burst, context past 30-40% before any Edit	Task spans too many files	Cause 1
Brand-new session already shows meaningful context used	Oversized `CLAUDE.md` / `AGENTS.md`	Cause 2
One Grep added tens of thousands of tokens	Search too broad	Cause 3
Context climbs after every failed tool call	Long error traces	Cause 4
Execution started with a large slice already used	Oversized plan-mode artifact	Cause 5
Multi-hour session, “Compacting conversation” appeared	Should have restarted	Cause 6

Common causes

1. Task spans too many files for one context window

Refactoring 30 files at once means reading 30 files, holding their content while writing changes, then writing back. At ~3-5K tokens per file, you are at 100-150K before any reasoning happens. A 1M window absorbs that, but every added file you do not actually need brings the auto-compaction line closer, and once it fires the model’s working memory of your plan is summarized away.

How to spot it: The session opens with a long Read / Grep sequence; /context shows files dominating usage before any Edit is made.

2. CLAUDE.md or AGENTS.md is huge

A 5000-line CLAUDE.md that gets loaded into every session burns 15-25K tokens before the first user message, and that cost recurs on every fresh session. The bigger problem is signal-to-noise: a bloated memory file buries the rules that actually matter.

How to spot it: A brand-new session already shows a non-trivial slice used in /context. Check wc -l CLAUDE.md and any nested AGENTS.md files.

3. Search results are too broad

Grep -r "auth" across a large repo returns thousands of matches; the model loads all of it into context to “be thorough”. Most of those matches are irrelevant.

How to spot it: After a Grep, context jumps by 20K+ tokens. Look at the Grep output — if it has hundreds of lines, it was too broad.

4. Long tool error traces eat the budget

A failing Bash command that prints 200 lines of stack trace per attempt, retried 4 times, costs 30-40K tokens by itself, all noise.

How to spot it: The session has multiple failed tool calls and the context indicator climbs after each failure.

5. Plan mode produced an oversized plan

A “comprehensive plan” that lists every file and every step inflates to 8-12K tokens. Combined with the supporting reads done to validate the plan, a meaningful slice is gone before code generation starts.

How to spot it: Plan mode artifact is over 200 lines; the session started executing with significant context already used.

6. Long-running session that should have been restarted

A 4-hour session that never restarted accumulates plan + decisions + tool calls + errors + retries. Even with no single huge action, the cumulative load fills the window and triggers compaction.

How to spot it: Session has been running for hours; the status line shows “Compacting conversation” or /context reports usage near the compaction threshold.

Before you start

Run /context to see current usage and the breakdown before starting a new task.
Estimate the task scope: how many files, how many tools, how much output. If any number feels large, decompose first.
Identify which files genuinely need to be read in full vs which can be sampled or grepped.

Information to collect

Current CLAUDE.md and AGENTS.md sizes (wc -l).
Approximate session length and recent actions taken.
The exact task description and any constraints already given.
A rough count of files that touch the task area.
Whether sub-agents have been used yet in the session.
Whether progress files (.agent-progress.md or equivalent) exist already.

Step-by-step fix

Step 1: Decompose the task into fits-in-one-context chunks

Rewrite the task as 3-5 ordered sub-tasks, each keeping its working set comfortably under the compaction line — aim for roughly 150-250K tokens of work per step on the 1M window, so a step never approaches the ~83.5% threshold:

Old (one task, too big):
"Refactor the auth flow across the app"

New (four sub-tasks):
1. Read current auth module, write a refactor plan to .auth-refactor-plan.md
2. Refactor src/auth/login.ts and src/auth/session.ts (commit)
3. Update consumers in src/api/* to use the new interface (commit)
4. Update tests and run them (commit)

Each step is committable independently and starts a fresh context.

Step 2: Use sub-agents for read-heavy investigations

For “go figure out how X works”, spawn a sub-agent via the Task tool. The sub-agent reads 30 files, returns a 500-token summary, and your main context only sees the summary:

Spawn sub-agent:
"Read all files in src/auth/ and src/api/auth* and produce a
markdown report covering: current public interfaces, callers,
session storage, error handling patterns. Return the report only."

Main session keeps a clean context; sub-agent’s reads happen in its own window.

Step 3: Replace whole-file reads with targeted reads

Instead of Read src/api/users.ts (500 lines, 4K tokens), use:

Grep "session" src/api/users.ts  → 12 matches, 200 tokens
Read src/api/users.ts:120-150    → 30 lines, 250 tokens

For most edits you need a small window of the file, not the whole file.

Step 4: Slim down CLAUDE.md to load-bearing content

CLAUDE.md should be under 500 lines (~2K tokens). Move stale or rarely-relevant content to scoped files that only get loaded when needed:

CLAUDE.md → keep: architecture decisions, conventions, must-not-violate rules
apps/web/AGENTS.md → web-specific details (only loaded when working in apps/web/)
docs/historical-decisions.md → archived rationale (read only when relevant)

Step 5: Save plans and state to disk between steps

Write the plan to a file, commit changes after each step, then start a fresh context for the next step using only the plan file as input:

# Step 1 of the refactor:
echo "## Refactor plan" > .auth-refactor-plan.md
# Claude writes the plan into this file

# Step 2 (new session):
# Run /clear (or start a fresh session), then paste:
# "Read .auth-refactor-plan.md and execute step 2 only"

Use /clear to reset the session to an empty context (this is different from /compact, which summarizes and keeps going). Each step then starts near-empty instead of carrying the previous step’s reads.

Step 6: Restart at clean boundaries instead of letting it auto-compact

Auto-compaction fires by default around 83.5% of the window and summarizes earlier turns to keep going — which is exactly the lossy event that costs you the plan. Get ahead of it: when /context shows you well into the window, finish the current sub-task, commit, and run /clear or start a new session.

Wrap-up signal:  finish current step, commit, /clear before compaction is close
Manual control:  /compact to summarize on purpose at a clean point (you choose when)
Tuning the line: set CLAUDE_AUTOCOMPACT_PCT_OVERRIDE (1-100) to move the trigger

Restarting at a clean boundary, where the plan is already committed and written to disk, is always cheaper than recovering from a summary that dropped step 3.

How to confirm it’s fixed

Run /context mid-step: usage stays well below the compaction threshold throughout execution.
No “Compacting conversation” message appears mid-task.
Each sub-task can be re-run independently from the saved plan if you /clear or restart.
The final result is consistent with the original plan, not a drift caused by lossy compaction.
Commits land in logical increments matching the sub-task structure.

Long-term prevention

Treat an approaching compaction threshold as a wrap-up trigger, not a “keep going” signal; commit and /clear.
Keep CLAUDE.md under 500 lines; promote rarely-used context to scoped files.
Default to Grep + targeted Read instead of full-file Read.
Pre-decompose any task that touches >10 files into sub-tasks before starting.
Use sub-agents for any investigation that requires reading more than 5 files — they run in their own context window and return only a summary.
Save plans and progress to disk in every task >20 minutes so a context reset or compaction does not lose work.
After each multi-file refactor, audit which files were truly needed in context vs which were read defensively.

Common pitfalls

Pasting a 2000-line file into the chat “for reference” — that one paste eats 8K tokens of headroom.
Re-Reading the same file three times in one session because the agent forgot it already had it.
Running Bash commands that produce massive stdout (find / , tree, cat large-log.txt) — pipe through head or save to a file instead.
Ignoring context-usage indicators until they cross 90% — by then summarization is imminent and the plan you depend on may be in the eviction queue.
Splitting a task into “sub-tasks” that are still huge (3 sub-tasks each at 100K) — sub-tasks must actually fit.
Treating sub-agents as expensive — they are cheap relative to a forced summary that ruins the plan.

FAQ

Q: What is the actual context limit in Claude Code? A: As of June 2026, Claude Code defaults to Opus 4.7 with a 1M-token context window (generally available since March 13, 2026; Sonnet 4.6 also supports 1M). Claude Code reserves a buffer (roughly 33K tokens) for the autocompact reservation, system prompt, and tool definitions, so usable working space is a bit less than the headline number. Run /context to see the exact split.

Q: At what point does it auto-compact? A: By default around 83.5% of the window. You can move that line with the CLAUDE_AUTOCOMPACT_PCT_OVERRIDE environment variable (a value from 1-100), or compact on your own terms with /compact.

Q: Does compaction always lose information? A: Yes — by design. Compaction summarizes earlier turns, and summarization is lossy. Some specifics will always be dropped. The fix is to put load-bearing specifics in CLAUDE.md or a progress file on disk so they survive a compaction or a /clear.

Q: How do I check current context usage? A: Run /context for a detailed breakdown, or read the status line / footer. Watch for a “Compacting conversation” indicator as a sign you crossed the threshold.

Q: If the window is 1M, why decompose at all? A: Because a single long task keeps accumulating reads, errors, and retries until it crosses the compaction line anyway — and compaction is what loses your plan. Decomposition keeps each step’s working set small, makes every step independently re-runnable from disk, and produces cleaner commits. A bigger window raises the ceiling; it does not make discipline optional.

Q: Should I use Opus or Sonnet for large tasks? A: Both support 1M windows. Opus 4.7 reasons better on complex multi-step plans; Sonnet 4.6 is faster and cheaper for execution. For very large refactors, plan with Opus and execute with Sonnet sub-agents.

Tags: #Claude Code #agent #Troubleshooting