Codex Stops Mid-Refactor on a Large Diff: How to Finish the Job

Codex hits a context limit mid-task and leaves a half-applied patch. Scope tasks to one verb, split into multi-PR plans, tune auto-compaction, and resume cleanly with config that works as of June 2026.

Published: May 24, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You asked Codex to “refactor the auth module to use the new session API.” Forty minutes later it stops mid-task: the PR has eight files changed, but six more it touched still carry leftover // TODO: migrate this comments. Or the transcript ends with context_length_exceeded, or it pauses to compact and never recovers. Codex did real work, but it ran out of context partway through and your branch is now half-migrated.

Fastest fix: stop trying to do the whole refactor in one run. Re-scope to one verb on an explicit file list (Step 1), and if it is genuinely large, split it into a multi-PR plan (Step 2) so each run fits inside the context window. That single change resolves the large majority of these stalls. The rest of this guide covers the config tuning and resume workflow for the cases where scoping alone is not enough.

This is not a Codex bug. It is what happens when a task spans more code than fits in the model’s context window. As of June 2026, Codex CLI does not enforce a hard per-task “tool-call cap” — instead it auto-compacts the conversation when token usage crosses a threshold, and a refactor that is too big simply keeps blowing past what compaction can preserve. The cure is scoping discipline up front, plus a few harness-level levers (model choice, multi-PR plans, surgical file lists, and compaction tuning).

Which bucket are you in?

Match the last lines of your Codex transcript (or the TUI status bar) to the right row, then jump to the fix it points to.

Symptom in transcript / TUI	Likely cause	Go to
`context_length_exceeded` or “Context window exceeded”	Task touches more code than the window holds	Steps 1-3
”Compacting context…” then stalls or errors	Auto-compaction can’t preserve enough; session too long	Steps 1-2, 5
Stops after N files, rest untouched, no error	Ran low on budget; trailed off	Steps 1-3
One file shows thousands of changed lines in `git diff --stat`	A generated/lockfile/vendored blob ate the budget	Step 4
Same `read_file` path appears 3+ times in transcript	Re-reading files already in context	Steps 3-4
It changed unrelated files (“while I was here, I also…”)	Scope was open-ended	Steps 1, 6

Common causes

1. The task touches more files than the context can hold

You asked for a refactor across the auth module: 30 files, 8k lines. Even with a large context window, Codex burns budget reading each file and producing edits. Partway through, the next file’s content no longer fits alongside the running plan, so auto-compaction kicks in — and if compaction can’t keep the thread coherent, the run stalls.

How to spot it: transcript ends with context_length_exceeded or Maximum context reached, or it trails off after N files with the rest untouched.

2. Context fills faster than compaction can recover it

Long refactors generate a lot of history: read_file, apply_patch, read_file again to verify, run_shell to type-check. When usage crosses model_auto_compact_token_limit, Codex summarizes the conversation and rebuilds a shorter history. On very long sessions this can fail — compacting GPT-5.5 sessions is a known weak spot as of mid-2026 (/compact can report success in the UI yet still error with context_length_exceeded).

How to spot it: the status line shows “Compacting context…” and then the run either errors or loses the thread (it forgets earlier decisions and starts re-doing work).

3. One file in the patch is enormous

A 5,000-line generated file (lockfile, SVG, vendored library) is included in the diff. Reading and rewriting it consumes most of the budget for a single operation.

How to spot it: git diff --stat shows one file with thousands of changed lines, or the agent appears to “freeze” on a single file.

4. Codex re-reads files it already has in context

The model loses track of what it already loaded and calls read_file on the same path several times. Each read costs tokens. By the end, a big slice of the window is duplicate file content. (Note: after a compaction, Codex automatically re-reads up to 5 recently edited files — useful, but on a sprawling task it can mean repeatedly re-loading the same large files.)

How to spot it: search the transcript for repeat read_file calls on the same path.

5. The task is open-ended (“refactor everything”)

You wrote “clean up the auth module.” Codex read that as “rewrite 12 files,” when you would have been happy with “rename User.uid to User.id across the auth module.” Open-ended scope expands until it busts the budget. Empirical data on agentic pull requests bears this out: rejected PRs touch roughly 10% more files and carry about 17% more changed lines than accepted ones (MSR 2026 study of ~33,000 agentic PRs). Smaller is not just easier on the agent — it merges more often.

How to spot it: compare your task description to the files Codex actually touched. If it changed unrelated things, the scope was unbounded.

Shortest path to fix

Step 1: Scope the task to one verb + one module

Bad: “Refactor the auth module.” Good: “In src/auth/*.ts only, rename User.uid to User.id. Update call sites. Do not touch tests beyond the mechanical rename.”

A good prompt has:

One verb: rename, extract, replace, delete, add
One scope: a path glob or an explicit file list
A non-goal: “do not change X, Y, Z”

# Good Codex task template

GOAL: <one sentence, one verb>
SCOPE: <explicit file list or glob>
NON-GOALS: <what NOT to touch>
ACCEPTANCE: <tests pass, plus 1 specific check>

Step 2: Break large changes into a multi-PR plan

For anything spanning more than ~10 files (or roughly 300 changed lines — a useful one-PR ceiling), write a plan and have Codex execute one step per PR:

# Auth migration plan

PR 1: Add new `Session` API alongside old `auth.cookie` (no callers move yet)
PR 2: Migrate `src/auth/login.ts` and `src/auth/logout.ts`
PR 3: Migrate `src/auth/middleware/` (3 files)
PR 4: Migrate `src/pages/api/` callers (10 files, mechanical)
PR 5: Delete old `auth.cookie`, update README

Each PR is small enough to fit in one Codex run, and reviewers can keep up. If you use the Codex app, each task gets its own git worktree, so you can run independent PR steps in parallel without branch conflicts.

Step 3: Pre-list the exact files in the prompt

Save the agent the discovery step (which itself costs reads and tokens):

Files to edit (and only these):

- src/auth/session.ts
- src/auth/cookie.ts
- src/auth/middleware/withAuth.ts
- tests/auth/session.test.ts

If you need to change a file not in this list, stop and ask.

This both shrinks the working set and prevents scope creep.

Step 4: Keep lockfiles and generated files out of the agent’s view

In AGENTS.md at the repo root:

## Files to skip when planning

Do not read these in full unless explicitly asked:

- package-lock.json, pnpm-lock.yaml, yarn.lock
- *.svg larger than 200 lines
- src/generated/**
- public/**
- *.min.js, *.min.css

If a lockfile needs updating, run `npm install` instead of editing it.

These are the worst budget hogs, and the agent never benefits from reading them line by line. You can reinforce this with a .codexignore file (same glob syntax as .gitignore) so the indexer skips them outright.

Step 5: Tune the model and the auto-compaction threshold

Codex CLI lets you pick the underlying model and control when it compacts. For tasks that genuinely span many files, run on the strongest model and give compaction more headroom.

# Run a non-interactive task on the recommended model
codex exec --model gpt-5.5 "execute the plan in PLAN.md"

As of June 2026, gpt-5.5 is the recommended default for complex coding in Codex; gpt-5.4 and the faster gpt-5.4-mini (good for subagents) are also available. Pick the model with --model/-m, or set it in config.toml.

There is no --max-tool-calls flag — Codex relies on auto-compaction, not a hard call cap, so don’t waste time looking for one. Instead, tune compaction in ~/.codex/config.toml (or per-project .codex/config.toml):

# ~/.codex/config.toml
model = "gpt-5.5"

# Let Codex see the full window before it compacts.
model_context_window = 400000

# Raise the threshold that triggers auto-compaction
# (Codex clamps this to ~90% of the context window).
model_auto_compact_token_limit = 350000

model_auto_compact_token_limit is the token count at which Codex auto-summarizes history; raising it (within the clamp) lets a run go further before it has to shed context. Setting a correct model_context_window matters too — if it’s detected too low, Codex compacts earlier than it needs to. A bigger budget on an unscoped task just produces a bigger mess, so do Step 1 first.

Step 6: Have Codex emit a resume note when it stops

In AGENTS.md:

If you run low on context, before stopping write a file
`.codex/resume.md` with:

- Which files are done
- Which files are partially edited (and what state they are in)
- Which files are untouched
- What the next single action should be

Commit `.codex/resume.md` in the same PR.

The next run can resume from the note instead of re-discovering state — and you can paste the note into a fresh session, which sidesteps the unreliable long-session /compact path entirely.

How to confirm it’s fixed

Run completes without a context error. The transcript ends with the agent’s summary and a passing check, not context_length_exceeded.
The diff matches your file list. git diff --stat shows only the files you scoped — no surprise extras. If you see a generated file or lockfile in the stat, tighten AGENTS.md/.codexignore.
No leftover migration markers. git grep -n "TODO: migrate" (or your project’s marker) returns nothing in the touched paths.
Acceptance check passes. Run the specific check from your ACCEPTANCE line (a test, a type-check, a grep that the old symbol is gone), not just “tests pass.”

If all four hold, the task actually finished — not just stopped.

FAQ

Is there a way to make Codex never run out of context? No. Every model has a fixed window, and auto-compaction is lossy — it summarizes older history, so detail is dropped. The reliable path is to keep each task small enough that it never approaches the limit, rather than relying on compaction to rescue an oversized task.

Does switching to gpt-5.5 alone fix a stalled refactor? Often not. A larger or stronger model buys headroom, but an open-ended task expands to fill it. Scope the task (Steps 1-3) first; change the model second.

Codex says “Context compacted” but then errors with context_length_exceeded. Why? Compacting very long GPT-5.5 sessions is a known weak spot as of mid-2026: the summarization payload itself can exceed the window. Don’t fight it — start a fresh session and hand it your .codex/resume.md (Step 6) instead of trying to compact a giant thread.

What’s the right PR size for an agent? Keep each PR to one concern and roughly 300 changed lines or fewer. The agentic-PR data is blunt: bigger PRs (more files, more lines) get rejected more often. If a change needs more, it needs a multi-PR plan (Step 2).

Where do I put repo-wide rules vs. per-task instructions? Long-lived rules (skip lists, conventions, the resume-note instruction) go in AGENTS.md. Per-task scope (the file list, the one verb, non-goals) goes in the prompt. Don’t bury task scope in AGENTS.md — it bloats every run’s context.

Prevention

Scope every task to one verb + one module + an explicit file list.
Split anything spanning more than ~10 files (or ~300 lines) into a multi-PR plan.
Tell AGENTS.md / .codexignore which paths to skip (lockfiles, generated, public assets).
For unavoidable large work, run gpt-5.5 and raise model_auto_compact_token_limit after scoping.
Have Codex write .codex/resume.md if it stops short, so you can resume cleanly.
After every run, diff against the file list — if it touched extras, tighten the prompt.

External references: Codex CLI command-line reference · Codex models

Tags: #Codex #agent #Troubleshooting #PR size

Which bucket are you in?

Common causes

1. The task touches more files than the context can hold

2. Context fills faster than compaction can recover it

3. One file in the patch is enormous

4. Codex re-reads files it already has in context

5. The task is open-ended (“refactor everything”)

Shortest path to fix

Step 1: Scope the task to one verb + one module

Step 2: Break large changes into a multi-PR plan

Step 3: Pre-list the exact files in the prompt

Step 4: Keep lockfiles and generated files out of the agent’s view

Step 5: Tune the model and the auto-compaction threshold

Step 6: Have Codex emit a resume note when it stops

How to confirm it’s fixed

FAQ

Prevention

Related

Related Articles

Codex Committed to the Wrong Branch (or Straight to main)

Codex Stalls on a Merge Conflict or Resolves It the Wrong Way

Codex Added a Package but the Lockfile Did Not Change

Codex Fix Passes Every Test but Breaks at Runtime

Codex Creates a Duplicate TypeScript Interface for One That Already Exists

Codex Rewrote Git History You Did Not Want Touched (amend / rebase / force-push)