Codex Cannot Finish a Patch Cleanly

Codex stops mid-patch — broken imports, half-converted types, untouched call sites. Bind "done" to a verifier, not a vibe.

Codex returns a patch, announces “I have completed the changes,” and you find: half the call sites still use the old signature, two imports point at a function that no longer exists, the types compile but one test now hangs forever. The patch isn’t done — Codex declared it done.

This is a “completion criterion” failure, not an intelligence failure. Codex stops when it judges the work complete; if you didn’t define complete in measurable terms, you get whichever stopping point its judgment lands on. The fix is to bind “done” to a verifier — typecheck, tests, lint, exit codes — that Codex must report on before claiming completion.

Common causes

Ordered by hit rate, highest first.

1. The task didn’t define “done”

“Update getUserById to return null on missing user” — done means what? Compiles? Passes existing tests? Adds a new test? All callers updated? Codex picks the shallowest interpretation by default.

How to spot it: Re-read your prompt. If the word “done”, “complete”, or a measurable verifier (typecheck / tests / lint) doesn’t appear, Codex picked its own definition.

2. Codex ran out of effective context mid-patch

On a multi-file change, Codex held the first 3 files clearly in mind, but by file 6 it was working from a fading mental model of file 1. The signature change is correct in service.ts but missed in controller.ts:142 because Codex no longer remembered what service.ts looked like.

How to spot it: Count files touched + files that should have been touched. If “should have” > “actually” and the missing files come from grep results listed early in the session, context decay is the cause.

3. Tool (test/typecheck) failed silently, Codex assumed success

pnpm typecheck 2>&1 | tail -10 returned 10 lines that look clean, but the actual error was 200 lines up. Codex saw the tail, assumed clean, and stopped.

How to spot it: Re-run the same verifier locally with full output. If the local exit code is non-zero but Codex thought it passed, output truncation hid the failure.

4. Codex hit a thinking-cost ceiling and short-circuited

Some Codex harnesses cap thinking budget per turn. Codex senses the budget shrinking, prematurely concludes “this is close enough,” and ships.

How to spot it: The chat shows Codex’s final reasoning is suspiciously brief vs. earlier turns. Or the patch covers 80% of the diff with the last 20% missing — that “we ran out of time” shape.

5. Codex hedged on an ambiguous call site

Patch should rename getUser → getUserById everywhere. In tests/fixtures/old-data.ts, the file imports getUser as a string from a JSON snapshot. Codex didn’t know whether to touch it, skipped it, and didn’t flag the skip.

How to spot it: Grep for the old name after the patch lands. If non-zero matches and the patch didn’t mention them, Codex skipped silently.

6. The task fundamentally bundled multiple concerns

“Refactor auth and add password reset and update the docs” — three tasks in one prompt. Codex finishes the first, partly does the second, forgets the third, says “done.” The 80/20 falls within each subtask, not across them.

How to spot it: Compare what was done to the bullet list in your prompt. If one bullet got full attention and others got 30% — task should have been three runs.

Shortest path to fix

Ordered by ROI. Step 1 alone fixes the majority of incomplete-patch cases.

Step 1: Bind “done” to a verifier in the prompt

Use this template:

Task: [one-sentence goal]

DONE means ALL of the following pass with zero NEW errors:
1. `pnpm typecheck` — report exit code + last 20 lines
2. `pnpm test -- --reporter=verbose` — report pass/fail count
3. `pnpm lint --max-warnings 0` — report exit code

Before saying "done", paste the output of each command.
If any fails, DO NOT say done — fix and re-run.

The “paste the output” clause forces Codex to actually run the commands rather than hallucinate that they passed.

Step 2: Demand a file-coverage report before completion

For multi-file changes, add:

Before saying done:
1. List every file that needed to change for this task.
2. For each file, paste a one-line diff summary.
3. Confirm no callers of the changed function were missed:
   grep -rn "oldFunctionName" --include="*.ts" --include="*.tsx" .
   If results > 0, you missed call sites — fix them.

Step 3: Verify externally, don’t trust the report

After Codex returns, run the same verifier yourself:

pnpm typecheck && pnpm test && pnpm lint
echo "Exit code: $?"

If exit code is non-zero, the patch isn’t done regardless of what Codex said. Paste the actual error back to Codex with:

Verifier failed. Output:
[paste]

You said done but typecheck reports the above. Fix and re-run.
Do not say done until all three verifiers exit 0.

Step 4: Split tasks at the natural concern boundary

If the task touched three concerns, run three sessions:

Session 1: Rename getUser → getUserById everywhere. Stop.
Session 2: Add the missing null check in the new signature. Stop.
Session 3: Add tests covering null returns. Stop.

Each session has a small “done” definition that fits in context.

Step 5: For incomplete patches, ask for a completion diff, not a rewrite

If Codex stops 80% done, don’t re-run the whole task. Pin it to the remaining work:

You completed the patch in service.ts but the following call sites still use the old signature:
- controller.ts:42
- routes/api.ts:118
- tests/fixtures.ts:7

Update only these three locations. Run typecheck after. Report exit code.

Step 6: Increase output budget for the verifier

If Codex misses errors because output was truncated, drop the | tail -50:

# Bad — hides upstream errors
pnpm typecheck 2>&1 | tail -50

# Good — full output, with the failing file highlighted
pnpm typecheck 2>&1 | grep -E "error TS|^[a-z].*\.tsx?:" | head -100

Filter by error pattern instead of arbitrary tail; Codex sees the actual errors, not the trailing summary.

Prevention

  • Every Codex prompt ends with a “DONE means” block listing verifiers and required output to paste
  • Always verify externally — the agent’s “done” report is never the source of truth, the exit code is
  • Split prompts at concern boundaries; one task → one done criterion → one session
  • For multi-file changes, require a grep of the old symbol as proof no call sites were missed
  • When Codex stops 80% done, pin it to the remaining files explicitly instead of re-running the whole task
  • Wire CI as the final gate — even verified Codex patches get re-checked before merge

Tags: #Codex #Coding agent #Troubleshooting #Debug #Incomplete patch