Codex returns a patch, announces “I have completed the changes,” and you find: half the call sites still use the old signature, two imports point at a function that no longer exists, the types compile but one test now hangs forever. The patch isn’t done — Codex declared it done.
This is a “completion criterion” failure, not an intelligence failure. Codex stops when it judges the work complete; if you didn’t define complete in measurable terms, you get whichever stopping point its judgment lands on. The fix is to bind “done” to a verifier — typecheck, tests, lint, exit codes — that Codex must report on before claiming completion.
Common causes
Ordered by hit rate, highest first.
1. The task didn’t define “done”
“Update getUserById to return null on missing user” — done means what? Compiles? Passes existing tests? Adds a new test? All callers updated? Codex picks the shallowest interpretation by default.
How to spot it: Re-read your prompt. If the word “done”, “complete”, or a measurable verifier (typecheck / tests / lint) doesn’t appear, Codex picked its own definition.
2. Codex ran out of effective context mid-patch
On a multi-file change, Codex held the first 3 files clearly in mind, but by file 6 it was working from a fading mental model of file 1. The signature change is correct in service.ts but missed in controller.ts:142 because Codex no longer remembered what service.ts looked like.
How to spot it: Count files touched + files that should have been touched. If “should have” > “actually” and the missing files come from grep results listed early in the session, context decay is the cause.
3. Tool (test/typecheck) failed silently, Codex assumed success
pnpm typecheck 2>&1 | tail -10 returned 10 lines that look clean, but the actual error was 200 lines up. Codex saw the tail, assumed clean, and stopped.
How to spot it: Re-run the same verifier locally with full output. If the local exit code is non-zero but Codex thought it passed, output truncation hid the failure.
4. Codex hit a thinking-cost ceiling and short-circuited
Some Codex harnesses cap thinking budget per turn. Codex senses the budget shrinking, prematurely concludes “this is close enough,” and ships.
How to spot it: The chat shows Codex’s final reasoning is suspiciously brief vs. earlier turns. Or the patch covers 80% of the diff with the last 20% missing — that “we ran out of time” shape.
5. Codex hedged on an ambiguous call site
Patch should rename getUser → getUserById everywhere. In tests/fixtures/old-data.ts, the file imports getUser as a string from a JSON snapshot. Codex didn’t know whether to touch it, skipped it, and didn’t flag the skip.
How to spot it: Grep for the old name after the patch lands. If non-zero matches and the patch didn’t mention them, Codex skipped silently.
6. The task fundamentally bundled multiple concerns
“Refactor auth and add password reset and update the docs” — three tasks in one prompt. Codex finishes the first, partly does the second, forgets the third, says “done.” The 80/20 falls within each subtask, not across them.
How to spot it: Compare what was done to the bullet list in your prompt. If one bullet got full attention and others got 30% — task should have been three runs.
Shortest path to fix
Ordered by ROI. Step 1 alone fixes the majority of incomplete-patch cases.
Step 1: Bind “done” to a verifier in the prompt
Use this template:
Task: [one-sentence goal]
DONE means ALL of the following pass with zero NEW errors:
1. `pnpm typecheck` — report exit code + last 20 lines
2. `pnpm test -- --reporter=verbose` — report pass/fail count
3. `pnpm lint --max-warnings 0` — report exit code
Before saying "done", paste the output of each command.
If any fails, DO NOT say done — fix and re-run.
The “paste the output” clause forces Codex to actually run the commands rather than hallucinate that they passed.
Step 2: Demand a file-coverage report before completion
For multi-file changes, add:
Before saying done:
1. List every file that needed to change for this task.
2. For each file, paste a one-line diff summary.
3. Confirm no callers of the changed function were missed:
grep -rn "oldFunctionName" --include="*.ts" --include="*.tsx" .
If results > 0, you missed call sites — fix them.
Step 3: Verify externally, don’t trust the report
After Codex returns, run the same verifier yourself:
pnpm typecheck && pnpm test && pnpm lint
echo "Exit code: $?"
If exit code is non-zero, the patch isn’t done regardless of what Codex said. Paste the actual error back to Codex with:
Verifier failed. Output:
[paste]
You said done but typecheck reports the above. Fix and re-run.
Do not say done until all three verifiers exit 0.
Step 4: Split tasks at the natural concern boundary
If the task touched three concerns, run three sessions:
Session 1: Rename getUser → getUserById everywhere. Stop.
Session 2: Add the missing null check in the new signature. Stop.
Session 3: Add tests covering null returns. Stop.
Each session has a small “done” definition that fits in context.
Step 5: For incomplete patches, ask for a completion diff, not a rewrite
If Codex stops 80% done, don’t re-run the whole task. Pin it to the remaining work:
You completed the patch in service.ts but the following call sites still use the old signature:
- controller.ts:42
- routes/api.ts:118
- tests/fixtures.ts:7
Update only these three locations. Run typecheck after. Report exit code.
Step 6: Increase output budget for the verifier
If Codex misses errors because output was truncated, drop the | tail -50:
# Bad — hides upstream errors
pnpm typecheck 2>&1 | tail -50
# Good — full output, with the failing file highlighted
pnpm typecheck 2>&1 | grep -E "error TS|^[a-z].*\.tsx?:" | head -100
Filter by error pattern instead of arbitrary tail; Codex sees the actual errors, not the trailing summary.
Prevention
- Every Codex prompt ends with a “DONE means” block listing verifiers and required output to paste
- Always verify externally — the agent’s “done” report is never the source of truth, the exit code is
- Split prompts at concern boundaries; one task → one done criterion → one session
- For multi-file changes, require a
grepof the old symbol as proof no call sites were missed - When Codex stops 80% done, pin it to the remaining files explicitly instead of re-running the whole task
- Wire CI as the final gate — even verified Codex patches get re-checked before merge
Related
- Codex patch conflicts with existing code
- Codex fails to run or interpret build results
- Codex fixes one bug but breaks nearby logic
- Codex beginner guide
- Codex code review workflow
- Codex vs Claude Code
Tags: #Codex #Coding agent #Troubleshooting #Debug #Incomplete patch