Codex Cannot Finish a Patch Cleanly

Q: How do I tell context decay from a thinking-budget cutoff?

Run `/status`. If the context window is near full, it's decay (Step 4 — split the task). If context is fine but the final reasoning was unusually short and the last ~20% is missing, it's a turn budget (also Step 4 — but a targeted completion prompt in Step 5 often finishes it in one more turn).

Q: `apply_patch` failed with an error — is that the same problem?

No. Strings like `Failed to apply patch`, `command failed; retry without sandbox?`, or `No such file or directory` after approval are mechanical tool failures: the edit never landed at all. These cluster around a sandbox regression in recent Codex CLI builds (reported across roughly 0.115.0–0.134.0, with a clear break at 0.118.0 that 0.117.0 didn't have), where the bubblewrap sandbox blocks the write and Codex loops on the retry prompt. Fix by checking `codex --version` and upgrading to the latest, or as a workaround switch the session to `--full-auto` / `danger-full-access` so the write isn't sandboxed — not by tightening your prompt.

Codex says "done" but leaves broken imports, half-converted types, and untouched call sites. Bind "done" to a verifier in AGENTS.md, then check the exit code yourself.

Published: May 17, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Codex returns a patch, announces “I have completed the changes,” and you find: half the call sites still use the old signature, two imports point at a function that no longer exists, the types compile but one test now hangs forever. The patch isn’t done — Codex declared it done.

Fastest fix: add a DONE means block to your prompt (or, better, to AGENTS.md) that lists the exact verifier commands Codex must run and paste output for, then re-run the same commands yourself and trust the exit code, not the chat. The template is in Step 1 below.

This is a completion-criterion failure, not an intelligence failure. Codex stops when it judges the work complete; if you never defined “complete” in measurable terms, you get whichever stopping point its judgment lands on. (Note: this is different from apply_patch tool errors like Failed to apply patch or No such file or directory after approval — those are mechanical failures where the edit never lands. If you see those strings, the patch isn’t half-done, it’s not applied at all, and the fix is upgrading Codex CLI or re-running the edit, not tightening the prompt.)

As of June 2026, Codex CLI defaults to GPT-5.5 (switch with /model; GPT-5.4 and the lighter Codex-Spark are also in the picker). GPT-5.5 is more thorough than earlier models, but it still stops on its definition of done unless you give it one.

Which bucket are you in?

Run one grep and one verifier before you re-prompt — the symptom tells you the cause.

Symptom you observe	Most likely cause	Go to
Prompt never said “done”/“complete”/a verifier name	Task didn’t define done	Step 1
Early files patched, later files (file 6+) missed	Context decay mid-patch	Step 4
Codex “passed” but your local typecheck exits non-zero	Output truncation hid the failure	Step 3, Step 6
Final reasoning suspiciously short; last ~20% missing	Hit a thinking/turn budget	Step 4
`grep` of the old symbol still returns matches	Codex skipped an ambiguous call site silently	Step 2
One bullet done well, others 30%	Prompt bundled multiple concerns	Step 4

Common causes

Ordered by hit rate, highest first.

1. The task didn’t define “done”

“Update getUserById to return null on missing user” — done means what? Compiles? Passes existing tests? Adds a new test? All callers updated? Codex picks the shallowest interpretation by default.

How to spot it: Re-read your prompt. If the word “done”, “complete”, or a measurable verifier (typecheck / tests / lint) doesn’t appear, Codex picked its own definition.

2. Codex ran out of effective context mid-patch

On a multi-file change, Codex held the first 3 files clearly in mind, but by file 6 it was working from a fading mental model of file 1. The signature change is correct in service.ts but missed in controller.ts:142 because Codex no longer remembered what service.ts looked like. Run /status to see how much of the context window the session has burned — when it is near full, this is your cause.

How to spot it: Count files touched + files that should have been touched. If “should have” is greater than “actually” and the missing files come from grep results listed early in the session, context decay is the cause.

3. Tool (test/typecheck) failed silently, Codex assumed success

pnpm typecheck 2>&1 | tail -10 returned 10 lines that look clean, but the actual error was 200 lines up. Codex saw the tail, assumed clean, and stopped.

How to spot it: Re-run the same verifier locally with full output. If the local exit code is non-zero but Codex thought it passed, output truncation hid the failure.

4. Codex hit a thinking/turn budget and short-circuited

Codex limits how much work it does per turn. As the budget shrinks, it prematurely concludes “this is close enough” and ships.

How to spot it: The chat shows Codex’s final reasoning is suspiciously brief versus earlier turns. Or the patch covers 80% of the diff with the last 20% missing — that “we ran out of time” shape.

5. Codex hedged on an ambiguous call site

Patch should rename getUser to getUserById everywhere. In tests/fixtures/old-data.ts, the file imports getUser as a string from a JSON snapshot. Codex didn’t know whether to touch it, skipped it, and didn’t flag the skip.

How to spot it: Grep for the old name after the patch lands. If there are non-zero matches and the patch didn’t mention them, Codex skipped silently.

6. The task fundamentally bundled multiple concerns

“Refactor auth and add password reset and update the docs” — three tasks in one prompt. Codex finishes the first, partly does the second, forgets the third, says “done.” The 80/20 falls within each subtask, not across them.

How to spot it: Compare what was done to the bullet list in your prompt. If one bullet got full attention and others got 30%, the task should have been three runs.

Shortest path to fix

Ordered by ROI. Step 1 alone fixes the majority of incomplete-patch cases.

Step 1: Bind “done” to a verifier in the prompt

Use this template:

Task: [one-sentence goal]

DONE means ALL of the following pass with zero NEW errors:
1. `pnpm typecheck` — report exit code + last 20 lines
2. `pnpm test -- --reporter=verbose` — report pass/fail count
3. `pnpm lint --max-warnings 0` — report exit code

Before saying "done", paste the output of each command.
If any fails, DO NOT say done — fix and re-run.

The “paste the output” clause forces Codex to actually run the commands rather than hallucinate that they passed.

Make it durable — put the verifiers in AGENTS.md. Per OpenAI’s AGENTS.md guide, AGENTS.md is where Codex reads project instructions, and the recommended pattern is exactly this kind of rule (“Always run npm test after modifying JavaScript files”). Adding the block once means you don’t retype it every prompt:

## Definition of done
A change is only complete when all of these pass with zero new errors.
Run each, then paste exit code + last 20 lines before claiming done:
- `pnpm typecheck`
- `pnpm test -- --reporter=verbose`
- `pnpm lint --max-warnings 0`
Never say "done" while any verifier is non-zero.

Codex reads AGENTS.md from ~/.codex/AGENTS.md (global) plus every AGENTS.md from the git root down to your current directory, with closer files overriding. Files over project_doc_max_bytes (32 KiB by default) are truncated and empty files are skipped, so keep the block short. Confirm Codex actually loaded it with /status.

Step 2: Demand a file-coverage report before completion

For multi-file changes, add:

Before saying done:
1. List every file that needed to change for this task.
2. For each file, paste a one-line diff summary.
3. Confirm no callers of the changed function were missed:
   grep -rn "oldFunctionName" --include="*.ts" --include="*.tsx" .
   If results > 0, you missed call sites — fix them.

Step 3: Verify externally, don’t trust the report

After Codex returns, run the same verifier yourself:

pnpm typecheck && pnpm test && pnpm lint
echo "Exit code: $?"

If the exit code is non-zero, the patch isn’t done regardless of what Codex said. You can also run /diff inside Codex to see the full git diff (including untracked files) and /review to have a dedicated reviewer read the working tree and flag missed call sites without changing any code. Then paste the actual error back to Codex with:

Verifier failed. Output:
[paste]

You said done but typecheck reports the above. Fix and re-run.
Do not say done until all three verifiers exit 0.

Step 4: Split tasks at the natural concern boundary

If the task touched three concerns, or /status showed the context window nearly full, run separate sessions:

Session 1: Rename getUser to getUserById everywhere. Stop.
Session 2: Add the missing null check in the new signature. Stop.
Session 3: Add tests covering null returns. Stop.

Each session has a small “done” definition that fits in context. Start fresh sessions with /new so a fading mental model from the previous task doesn’t bleed in.

Step 5: For incomplete patches, ask for a completion diff, not a rewrite

If Codex stops 80% done, don’t re-run the whole task. Pin it to the remaining work:

You completed the patch in service.ts but the following call sites still use the old signature:
- controller.ts:42
- routes/api.ts:118
- tests/fixtures.ts:7

Update only these three locations. Run typecheck after. Report exit code.

Step 6: Increase output budget for the verifier

If Codex misses errors because output was truncated, drop the | tail -50:

# Bad — hides upstream errors
pnpm typecheck 2>&1 | tail -50

# Good — full output, with the failing file highlighted
pnpm typecheck 2>&1 | grep -E "error TS|^[a-z].*\.tsx?:" | head -100

Filter by error pattern instead of an arbitrary tail; Codex sees the actual errors, not the trailing summary.

How to confirm it’s fixed

The patch is genuinely done only when all three are true:

pnpm typecheck && pnpm test && pnpm lint exits 0 in your terminal (not just in Codex’s report).
grep -rn "oldSymbol" --include="*.ts" --include="*.tsx" . returns zero matches you didn’t intend to keep.
/diff (or git diff --stat) lists every file you expected — no fewer.

If any of the three fails, you’re still in one of the buckets above; loop back to that step. CI re-running the same commands on the PR is the final gate.

Prevention

Put a “Definition of done” block listing verifiers and required output in AGENTS.md so every session inherits it.
Always verify externally — the agent’s “done” report is never the source of truth, the exit code is.
Split prompts at concern boundaries; one task to one done criterion to one session.
For multi-file changes, require a grep of the old symbol as proof no call sites were missed.
Watch /status for context pressure; start a fresh session before it fills.
When Codex stops 80% done, pin it to the remaining files explicitly instead of re-running the whole task.
Wire CI as the final gate — even verified Codex patches get re-checked before merge.

FAQ

Why does Codex say “done” when the patch clearly isn’t? Because “done” is its own judgment call unless you defined it. With no measurable criterion, Codex stops at the shallowest interpretation that looks plausible. Give it a verifier list (Step 1) and the judgment becomes a pass/fail check instead.

Does putting verifiers in AGENTS.md actually make Codex run them? It makes Codex far more likely to run them and reduces per-prompt nagging, but it’s guidance, not a hard gate — OpenAI’s docs describe AGENTS.md as instructions, not enforcement. Always re-run the verifiers yourself (Step 3); the exit code is the only gate that can’t be talked around.

How do I tell context decay from a thinking-budget cutoff? Run /status. If the context window is near full, it’s decay (Step 4 — split the task). If context is fine but the final reasoning was unusually short and the last ~20% is missing, it’s a turn budget (also Step 4 — but a targeted completion prompt in Step 5 often finishes it in one more turn).

apply_patch failed with an error — is that the same problem? No. Strings like Failed to apply patch, command failed; retry without sandbox?, or No such file or directory after approval are mechanical tool failures: the edit never landed at all. These cluster around a sandbox regression in recent Codex CLI builds (reported across roughly 0.115.0–0.134.0, with a clear break at 0.118.0 that 0.117.0 didn’t have), where the bubblewrap sandbox blocks the write and Codex loops on the retry prompt. Fix by checking codex --version and upgrading to the latest, or as a workaround switch the session to --full-auto / danger-full-access so the write isn’t sandboxed — not by tightening your prompt.

Codex keeps re-doing the whole task when I just want the rest finished. Stop asking it to “finish the task” and instead name the exact remaining locations (Step 5). A scoped “update only these three call sites” prompt keeps the diff small and fits the change in context.

Tags: #Codex #Coding agent #Troubleshooting #Debug #Incomplete patch