How big is the Claude Code context window?

As of June 2026 it is 1,000,000 tokens on Opus 4.7 and Sonnet 4.6 (up from the old 200K). The usable budget is a bit lower once you subtract the system prompt, tools, history, and the ~33K-token reserve.

Why does my long answer stop even though `/context` shows plenty of room?

You hit the per-reply output cap, which is separate from the window. Sonnet 4.6 emits up to 64K output tokens per turn and Opus 4.7 up to 128K. Split the answer into sections to get past it.

Why did my conversation compact without my asking?

Claude Code auto-compacts when window usage reaches 95% by default. Lower the trigger with `CLAUDE_AUTOCOMPACT_PCT_OVERRIDE`, run `/compact` yourself earlier, or disable it with `claude config set -g autoCompactEnabled false`.

How do I just continue a truncated answer?

Type `continue from the last sentence`. The partial text is still in context, so Claude Code picks up where it stopped without re-spending tokens on a restart.

Does writing to a file count against context?

The `Write` tool call uses some tokens, but the file content does not stay in the window — that is the whole point of routing long output to disk.

Should I always disable unused MCP servers?

For long sessions, yes — each connected server inflates the system block. For a quick one-off task it rarely matters.

Troubleshooting

Claude Code Output Truncated by Context Window

A long Claude Code reply cuts off mid-sentence. Fastest fix: tell it to continue from the last line. Real cause is usually auto-compact, the per-reply output cap, or one huge tool result eating the budget.

Published: May 24, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You ask Claude Code for a thorough summary, a long plan, or a multi-section report. It starts writing, gets halfway, and then just stops mid-sentence — no error, no continuation prompt, no obvious cause. Or the next turn arrives with a “Context low · run /compact” note (auto-compaction) and the original detail is gone.

Fastest fix (works ~80% of the time): type continue from the last sentence and press enter. The partial answer is still in context, so Claude Code resumes instead of restarting. If it keeps cutting off at the same spot, the real cause is one of the buckets below — most often a single giant tool result that crowded out the reply.

Why this happens at all: Claude Code runs inside one fixed context window (1,000,000 tokens on Opus 4.7 and Sonnet 4.6 as of June 2026). Tool results, file reads, prior turns, and the system prompt all share that window with your current reply. Two separate limits can cut a reply short — the per-reply output cap (a ceiling on one answer, set by the model) and the context window (the total budget, which triggers auto-compaction near the top). Truncation almost always traces to one of these.

Which bucket are you in

Symptom	Likely cause	Jump to
Reply stops at a round size every time (~64K Sonnet, ~128K Opus)	Per-reply output cap	Cause 1
A huge `Bash`/`Read` ran right before the cutoff	One tool result ate the window	Cause 2
You saw a `/compact` or “Context low” banner	Auto-compaction	Cause 3
`/context` shows 40K+ tokens before your first prompt	Bloated system block (MCP/skills)	Cause 4
Session is long and full of old code dumps	Verbose history	Cause 5
Reply just ends, looks complete-ish	Model finished (`end_turn`)	Cause 6

Common causes

Ordered by hit rate, highest first.

1. Per-reply output cap hit on the assistant reply

Separate from the context window, each single reply has a maximum output length set by the model. As of June 2026, Claude Code can emit up to 64K output tokens on Sonnet 4.6 and up to 128K on Opus 4.7 per turn. If your prompt invites a 200K-token answer, the reply terminates at that ceiling regardless of how much context window is free.

How to judge: The reply is very long and stops at a consistent size every time you retry the same prompt. Run /context after the cutoff — if the total context is nowhere near full but the reply still stopped, you hit the per-reply cap, not the window.

2. A single tool result consumed most of the window

A Bash call that dumps 200k characters or a Read on a huge file leaves no room for the assistant to think and reply.

How to judge: Look at the most recent tool_result before the truncation. If it is enormous, that is the issue.

3. Auto-compaction kicked in mid-thought

When window usage crosses the auto-compact threshold (95% by default as of June 2026), Claude Code summarizes older turns into a compressed recap and continues from that. The fresh turn after compaction may lose detail it was relying on, so a long answer can read as “truncated” when it was actually rebuilt from a summary. Claude Code also reserves a buffer near the top of the window (roughly 33K tokens / about 10%), so the usable budget is a bit below the headline 1M.

How to judge: You saw a “Context low · run /compact” banner or a checkpoint line, or the transcript JSON contains a summary record right before the truncated turn.

4. The system prompt + skills + tools grew without notice

Every plugin, skill, and MCP server adds tokens to the system block before you type a word. A modest prompt can now start with 40K+ tokens of overhead, leaving less for the reply and pushing you toward the compaction threshold faster.

How to judge: Run /context (or start with claude --debug) and read the system / tools / MCP token breakdown. Surprisingly high numbers there explain a lot.

5. Prior assistant turns were verbose and stayed in context

If earlier turns dumped huge code blocks, those stay in the context window. A few of them quickly add up to your full budget.

How to judge: Scroll up. If the conversation is full of multi-thousand-line code dumps, the budget is gone.

6. The model genuinely had nothing more to say

Sometimes truncation looks like a cap but is actually the model stopping at a logical end. The “mid-sentence” feeling can be a missing period the model never produced.

How to judge: Open the assistant record in the transcript JSONL and read its stop_reason. end_turn means the model chose to stop; max_tokens means it was capped at the per-reply limit.

Before you start

Note the approximate length of the truncated reply (lines or characters).
Have access to recent transcript files under ~/.claude/projects/.
Decide whether you need the full output once, or repeatable long outputs.
Be ready to split the prompt into smaller asks if needed.

Information to collect

Claude Code version: claude --version.
The exact prompt that produced the truncation.
A /context snapshot (system, tools, MCP, messages breakdown) right after the cutoff.
The last few tool results before the truncation point.
Any summary / compaction records in the transcript JSON.
The active model (/model), since the per-reply output cap differs between Sonnet 4.6 and Opus 4.7.

Step-by-step fix

Step 1: Read the context budget with /context

Inside the session, run:

/context

This shows what is currently loaded — system prompt, tools, MCP servers, and message history — as a fraction of the window. If total usage is near 95% you are about to auto-compact; if it is low but a long reply still stopped, you hit the per-reply output cap instead. To inspect the raw transcript, the JSONL session files live under ~/.claude/projects/:

ls -lt ~/.claude/projects/*/*.jsonl | head -3

Each line is one record; a compaction shows up as a summary record right before the rebuilt turn.

Step 2: Shrink the previous tool output

If a Bash or Read result before the truncation is huge, redo the call with narrower scope:

# Instead of dumping the whole file:
sed -n '100,200p' big-file.log

Or write to disk and have the model read a summary instead.

Step 3: Ask the model to continue from where it stopped

A simple “continue from the last sentence” usually works, since the prior context still has the partial answer. Avoid asking for a full restart; that doubles token use.

Step 4: Split the task into chapters

For long structured output, ask one section at a time:

Write section 1 only: Architecture overview. Stop after that section.

Then ask for section 2 in the next turn. Each turn fits comfortably under the cap.

Step 5: Compact strategically before the long reply

If you have a long history of unrelated tool calls, run /compact to summarize them yourself before asking for the big answer, freeing tokens for the upcoming reply. Detail may be lost, so review the recap. If surprise auto-compaction is the real problem, you can raise the trigger or turn it off:

# Push the auto-compact trigger lower so it fires earlier (default 95):
export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80

# Or disable auto-compaction entirely (writes to ~/.claude.json):
claude config set -g autoCompactEnabled false

You can also toggle “Auto-compact” inside /config. With it off you will hit a hard “Context low” stop instead of a silent rebuild, which is often easier to reason about.

Step 6: Disable unused MCP servers or skills

Each MCP server and skill adds system prompt tokens, which /context will show you. Disable ones you do not need this session:

claude --strict-mcp-config --mcp-config '{}'   # start with zero MCP servers loaded

--strict-mcp-config ignores your config files and loads only what you pass via --mcp-config; an empty object means none. Or remove servers from .mcp.json / settings temporarily. Fewer connected servers means a smaller system block and more room for the reply.

Step 7: For repeatable long outputs, write to a file

Have the agent generate the long output into a markdown file via Write, and reply only with the path and a summary:

Write the full report to /tmp/report.md and reply with the path
plus a 5-bullet summary.

This avoids the in-chat token cap entirely.

How to confirm it’s fixed

A repeat of the same prompt with the fixes produces a complete reply that ends on a real sentence.
/context after the reply shows usage comfortably below the 95% auto-compact line.
No surprise “Context low · run /compact” banner appears on the next few turns.
For the file-based approach, the target file on disk is complete even when the chat reply is short — open it and check the last line.

Long-term prevention

Keep tool outputs small; prefer head, tail, a sed range, or grep over full dumps.
For long results, write to disk and reference by path instead of pasting into chat.
Watch /context and run /compact yourself once a session passes ~70% of the window, rather than letting auto-compact fire mid-answer at 95%.
Audit active skills and MCP servers; disable what you do not use day to day so the system block stays small.
For repeatable reports, use a prompt template that always asks for one chapter at a time.

Common pitfalls

Asking the model to “regenerate everything from scratch” after truncation; you double the token use and may re-trigger the same limit.
Reading entire log files instead of grep-ing the relevant lines.
Letting auto-compaction silently lose important earlier decisions; read the recap it produces.
Leaving five MCP servers connected when you use one; check the cost with /context.
Treating every short reply as truncation when the model actually finished its turn.

FAQ

How big is the Claude Code context window? As of June 2026 it is 1,000,000 tokens on Opus 4.7 and Sonnet 4.6 (up from the old 200K). The usable budget is a bit lower once you subtract the system prompt, tools, history, and the ~33K-token reserve.
Why does my long answer stop even though /context shows plenty of room? You hit the per-reply output cap, which is separate from the window. Sonnet 4.6 emits up to 64K output tokens per turn and Opus 4.7 up to 128K. Split the answer into sections to get past it.
Why did my conversation compact without my asking? Claude Code auto-compacts when window usage reaches 95% by default. Lower the trigger with CLAUDE_AUTOCOMPACT_PCT_OVERRIDE, run /compact yourself earlier, or disable it with claude config set -g autoCompactEnabled false.
How do I just continue a truncated answer? Type continue from the last sentence. The partial text is still in context, so Claude Code picks up where it stopped without re-spending tokens on a restart.
Does writing to a file count against context? The Write tool call uses some tokens, but the file content does not stay in the window — that is the whole point of routing long output to disk.
Should I always disable unused MCP servers? For long sessions, yes — each connected server inflates the system block. For a quick one-off task it rarely matters.

Tags: #Claude Code #Troubleshooting #agent #Debug