Claude Code Output Truncated by Context Window

A long agent reply gets cut off mid-sentence under context pressure. Usually compaction, max_tokens cap, or runaway tool output eating the budget.

You ask Claude Code for a thorough summary, a long plan, or a multi-section report. It starts writing, gets halfway, and then just stops mid-sentence — no error, no continuation prompt, no obvious cause. Or the next turn arrives with a “context compacted” note and the original detail is gone. Claude Code operates inside a fixed-size context window. Tool calls, file reads, prior assistant turns, and the system prompt all share that window with your current reply. When the budget runs low, the model can be capped by max_tokens, the conversation can be compacted, or a single tool output can starve the rest of the turn. The fix depends on which pressure point hit you.

Common causes

Ordered by hit rate, highest first.

1. max_tokens cap hit on the assistant reply

Every assistant turn has a hard upper limit on response tokens. If your prompt invites a 20k-token answer but the cap is 8k, the reply terminates exactly at the cap.

How to judge: Count tokens of the truncated reply. If it sits very close to a round number like 8192 or 16384, you hit the cap.

2. A single tool result consumed most of the window

A Bash call that dumps 200k characters or a Read on a huge file leaves no room for the assistant to think and reply.

How to judge: Look at the most recent tool_result before the truncation. If it is enormous, that is the issue.

3. Conversation compaction kicked in mid-thought

When usage gets high, Claude Code may auto-compact older turns into a summary. The fresh assistant turn after compaction may lose detail it was relying on.

How to judge: Check for a “compacted” or “summarized” marker in the transcript right before the truncated turn.

4. The system prompt + skills + tools grew without notice

Every plugin, skill, and MCP server adds tokens to the system block. A modest user prompt now starts with 40k tokens of overhead, leaving little for the reply.

How to judge: Run claude --debug and look at the reported system token count. Surprisingly high numbers explain a lot.

5. Prior assistant turns were verbose and stayed in context

If earlier turns dumped huge code blocks, those stay in the context window. A few of them quickly add up to your full budget.

How to judge: Scroll up. If the conversation is full of multi-thousand-line code dumps, the budget is gone.

6. The model genuinely had nothing more to say

Sometimes truncation looks like a cap but is actually the model stopping at a logical end. The “mid-sentence” feeling can be a missing period the model never produced.

How to judge: Compare the stop reason in the API response. stop_reason: end_turn means the model chose to stop; max_tokens means it was capped.

Before you start

  • Note the approximate length of the truncated reply (lines or characters).
  • Have access to recent transcript files under ~/.claude/projects/.
  • Decide whether you need the full output once, or repeatable long outputs.
  • Be ready to split the prompt into smaller asks if needed.

Information to collect

  • Claude Code version: claude --version.
  • The exact prompt that produced the truncation.
  • Token usage reported at the end of the turn (if visible).
  • The last few tool results before the truncation point.
  • Any compaction markers in the transcript JSON.
  • claude --debug showing system prompt size.

Step-by-step fix

Step 1: Confirm the stop reason

Open the latest session transcript:

ls -lt ~/.claude/projects/*/sessions/ | head -3

Find the truncated turn. Look for stop_reason. max_tokens confirms a cap hit; end_turn means the model stopped voluntarily.

Step 2: Shrink the previous tool output

If a Bash or Read result before the truncation is huge, redo the call with narrower scope:

# Instead of dumping the whole file:
sed -n '100,200p' big-file.log

Or write to disk and have the model read a summary instead.

Step 3: Ask the model to continue from where it stopped

A simple “continue from the last sentence” usually works, since the prior context still has the partial answer. Avoid asking for a full restart; that doubles token use.

Step 4: Split the task into chapters

For long structured output, ask one section at a time:

Write section 1 only: Architecture overview. Stop after that section.

Then ask for section 2 in the next turn. Each turn fits comfortably under the cap.

Step 5: Compact strategically before the long reply

If you have a long history of unrelated tool calls, run /compact to summarize them, freeing tokens for the upcoming reply. Be aware that detail may be lost.

Step 6: Disable unused MCP servers or skills

Each MCP server and skill adds system prompt tokens. Disable ones you do not need in this session:

claude --no-mcp

Or remove them from settings.json temporarily.

Step 7: For repeatable long outputs, write to a file

Have the agent generate the long output into a markdown file via Write, and reply only with the path and a summary:

Write the full report to /tmp/report.md and reply with the path
plus a 5-bullet summary.

This avoids the in-chat token cap entirely.

Verify

  • A repeat of the same prompt with the fixes produces a complete reply.
  • The stop reason changes from max_tokens to end_turn in the transcript.
  • Subsequent turns no longer trigger surprise compaction.
  • File-based outputs are complete on disk even when the chat reply is short.

Long-term prevention

  • Keep tool outputs small; prefer head, tail, sed range, or grep over full dumps.
  • For long results, write to disk and reference by path.
  • Use /compact proactively when a session grows past 50% of the window.
  • Audit your active skills and MCP servers; disable what you do not use day to day.
  • For repeatable reports, build a structured prompt that always splits into chapters.

Common pitfalls

  • Asking the model to “regenerate everything from scratch” after truncation; you double the token use and re-trigger the cap.
  • Reading entire log files instead of grepping the relevant lines.
  • Letting compaction silently lose important earlier decisions; review compaction summaries.
  • Leaving five MCP servers connected when you only use one; the system prompt is huge.
  • Treating every short reply as truncation when the model actually ended its turn.

FAQ

  • How big is the Claude Code context window? Depends on the model. Sonnet and Opus support 200k tokens, but the practical budget for replies is much smaller after system prompt, tools, and history.
  • What does max_tokens mean exactly? The upper cap on the assistant’s reply tokens. Independent of overall context size.
  • Can I increase max_tokens? Sometimes via settings or model choice, but the safer fix is splitting the prompt.
  • Why did my conversation compact without asking? Claude Code auto-compacts when usage approaches the limit. You can also run /compact manually.
  • Does writing to a file count against context? The Write call itself uses some tokens, but the file content does not — that is its main advantage.
  • Should I always disable unused MCPs? Yes for long sessions, no for quick ones. Each connected server costs tokens.

Tags: #Claude Code #Troubleshooting #agent #Debug