Codex Bails Out When the Patch Gets Too Large

Codex hits a context or token cap mid-task and stops with a partial diff. How to scope tasks smaller, split across PRs, or move to a larger-context model.

You asked Codex to “refactor the auth module to use the new session API.” Forty minutes later it stops mid-task: the PR has eight files changed but six more it touched still have leftover // TODO: migrate this comments. Or the transcript ends with “Context window exceeded” or “Maximum tool calls reached.” Codex did real work, but it ran out of budget partway through and now your branch is half-migrated.

This is not a Codex bug — it is what happens when a task spans more code than fits in the model’s context plus its tool-call budget. The cure is scoping discipline before you start, plus a few harness-level levers (model choice, breaking changes into multi-PR plans, surgical file lists).

Common causes

1. Task touches more files than the context can hold

You asked for a refactor across the auth module: 30 files, 8k lines. Even with a 200k-token context, Codex burns budget reading each file and producing edits. Halfway through, it cannot fit the next file’s content alongside the running plan.

How to spot it: Transcript ends with context_length_exceeded, Maximum context reached, or just trails off after N files with the remaining files untouched.

2. Tool-call budget exhausted

The Codex harness caps tool calls per task (often 50–200). Long refactors burn calls fast: read_file, apply_patch, read_file again to verify, run_shell to type-check. The agent hits the cap before the work is done.

How to spot it: Transcript ends with “tool calls exhausted” or the agent stops emitting actions despite obvious unfinished work.

3. One file in the patch is enormous

A 5,000-line generated file (lockfile, SVG, vendored library) is included in the diff. Reading and writing it consumes most of the budget for one operation.

How to spot it: git diff --stat shows one file with thousands of changed lines. Or the agent appears to “freeze” on a single file.

4. Codex re-reads files it already has in context

The model loses track of what it already loaded, calls read_file on the same path three times. Each read costs tokens. By the end, half the context is duplicate file content.

How to spot it: Search transcript for repeat read_file calls on the same path. Common in older agent harnesses without memoization.

5. Task is open-ended (“refactor everything”)

You wrote “clean up the auth module.” Codex interpreted that as “rewrite 12 files,” when you would have been happy with “rename User.uid to User.id across the auth module.” Open-ended scope expands until it busts the budget.

How to spot it: Compare your task description to the actual files Codex touched. If it changed unrelated things (“while I was here, I also…”), the scope was unbounded.

Shortest path to fix

Step 1: Scope the task to one verb + one module

Bad: “Refactor the auth module.” Good: “In src/auth/*.ts only, rename User.uid to User.id. Update call sites. Do not touch tests beyond mechanical rename.”

A good prompt has:

  • One verb: rename, extract, replace, delete, add
  • One scope: a path glob or a list of files
  • A non-goal: “do not change X, Y, Z”
# Good Codex task template

GOAL: <one sentence, one verb>
SCOPE: <explicit file list or glob>
NON-GOALS: <what NOT to touch>
ACCEPTANCE: <tests pass, plus 1 specific check>

Step 2: Break large changes into a multi-PR plan

For anything spanning >10 files, write a plan and have Codex execute one step per PR:

# Auth migration plan

PR 1: Add new `Session` API alongside old `auth.cookie` (no callers move yet)
PR 2: Migrate `src/auth/login.ts` and `src/auth/logout.ts`
PR 3: Migrate `src/auth/middleware/` (3 files)
PR 4: Migrate `src/pages/api/` callers (10 files, mechanical)
PR 5: Delete old `auth.cookie`, update README

Each PR is small enough to fit in one Codex run. Reviewers can also keep up.

Step 3: Pre-list the exact files in the prompt

Save the agent the discovery step:

Files to edit (and only these):

- src/auth/session.ts
- src/auth/cookie.ts
- src/auth/middleware/withAuth.ts
- tests/auth/session.test.ts

If you need to change a file not in this list, stop and ask.

This both shrinks the working set and prevents scope creep.

Step 4: Exclude lockfiles and generated files from the agent’s view

In AGENTS.md:

## Files to skip when planning

Do not read these in full unless explicitly asked:

- package-lock.json, pnpm-lock.yaml, yarn.lock
- *.svg larger than 200 lines
- src/generated/**
- public/**
- *.min.js, *.min.css

If you need to update a lockfile, run `npm install` instead of editing it.

These are the worst budget hogs. The agent never benefits from reading them line by line.

Step 5: Move large tasks to a larger-context model

Codex CLI and similar harnesses let you pick the underlying model. For tasks that genuinely span many files, switch to the largest context model your harness supports:

# Codex CLI example
codex --model gpt-5.5 --max-tool-calls 200 "execute the plan in PLAN.md"

If your harness allows raising the tool-call cap, do so — but only after scoping the task. A bigger budget on an unscoped task just produces a bigger mess.

Step 6: Have Codex emit a resume note when it stops

In AGENTS.md:

If you run out of context or tool calls, before stopping write a file
`.codex/resume.md` with:

- Which files are done
- Which files are partially edited (and what state they are in)
- Which files are untouched
- What the next single action should be

Commit `.codex/resume.md` in the same PR.

The next run can resume from the note instead of re-discovering state.

Prevention

  • Scope every task to one verb + one module + an explicit file list
  • Split anything spanning >10 files into a multi-PR plan
  • Tell AGENTS.md which paths to skip (lockfiles, generated, public assets)
  • For unavoidable large work, use the largest-context model your harness supports
  • Have Codex write .codex/resume.md if it stops short, so you can resume cleanly
  • After every agent run, diff against the file list — if it touched extras, tighten the prompt

Tags: #Codex #agent #Troubleshooting #PR size