Codex Agent is running through a 12-step refactor. Around step 5 the agent stops responding. No error banner, no failed assertion, no traceback — just silence and a message like “task complete” that clearly is not true. You re-prompt and it picks up from a stale point, sometimes redoing work already done. This is almost never a “bug” in Codex — it is the agent hitting an invisible boundary: a turn limit, a sandbox idle timeout, an internal stop condition matching prematurely, or a context window overflow that quietly truncated the plan.
Common causes
Ordered by likelihood for typical mid-task halts.
1. Turn budget exhausted
Codex Agent runs with a hard cap on tool-call turns per task (commonly 25-50). Long refactors burn turns on reads, edits, lint, retests. When the budget is hit the agent prints a summary and stops, even if the plan is half done.
How to spot it: Count tool calls in the transcript. If it is close to a round number (25, 50, 100), you hit the cap.
2. Sandbox idle timeout
If a build, test, or install command takes longer than the sandbox idle timeout (often 60-120s with no stdout), the process gets killed, the agent treats it as “done”, and moves on to closing out.
How to spot it: The last tool call is a long-running shell command and no stdout appeared before the stop. Re-running the same command locally takes more than 60s.
3. Premature stop-sequence match
The agent looks for explicit “done” signals — phrases like “task complete”, “all tests pass”, “no further action needed”. If a tool’s stdout happens to contain that phrase mid-plan, the agent thinks it is done.
How to spot it: Look at the last tool output. A test runner that prints “all tests pass” partway through, or a script that echoes “done”, can short-circuit the loop.
4. Context window overflow silently truncates the plan
Codex’s plan list lives in the system / assistant context. As file reads accumulate, older turns get auto-summarized or dropped. The remaining steps fall off, the agent forgets them, and stops once the visible plan is empty.
How to spot it: Re-prompt with “what step are you on?” — if the answer references a step earlier than where it actually stopped, the plan was truncated.
5. Hidden permission prompt waiting offscreen
A sandbox-write or network-out tool call may produce an interactive permission request. In headless modes the prompt is silently denied; the agent records a “tool failed” and gives up the broader task.
How to spot it: Check the agent log for permission denied, requires approval, or non-interactive near the stop point.
6. Upstream rate limit / 429 retry exhausted
A 429 from the underlying model API triggers internal retries. After N retries the agent surrenders quietly, since the user-facing message is just a truncated assistant turn.
How to spot it: Look for 429, rate_limit_exceeded, or retrying in Ns in the agent telemetry / log file.
Before you start
- Note whether the stop happens at the same step every time or at random points; deterministic = code path, random = capacity / network.
- Save the full transcript before re-prompting — once you re-prompt, the prior tool-call history may get summarized away.
- If you have a budget setting (
--max-turns,OPENAI_AGENT_MAX_TURNS), record its current value.
Information to collect
- Exact last tool call before the stop (read / write / shell / search).
- Approximate turn count: count tool calls in the transcript.
- The model in use (gpt-5.5, gpt-5.4, etc.) and whether the session is using a long-context variant.
- Any stdout text near “done”, “complete”, “passed” that could be mistaken for a stop signal.
- Sandbox runtime + idle timeout values from your config.
Step-by-step fix
Ordered by ROI: cheapest checks first.
Step 1: Re-prompt with “continue the plan” and a step pointer
Most reliable rescue:
You stopped at step 5 of the plan. Continue from step 6.
Do not redo steps 1-5. Print the remaining plan first, then execute.
If the agent immediately resumes correctly, the root cause was a stop-sequence match or truncated plan, not a hard limit.
Step 2: Raise the turn budget
In CLI / API:
codex agent run --max-turns 100 task.md
Or in environment:
export OPENAI_AGENT_MAX_TURNS=150
Long refactors realistically need 60-120 turns. Setting the cap to 30 because “it usually fits” is the most common preventable cause.
Step 3: Break the task into checkpointed sub-tasks
Even with raised limits, one mega-prompt is fragile. Split:
Task 1: Refactor src/auth/* to async/await. Stop and report.
Task 2: Update src/auth/*.test.ts to match. Stop and report.
Task 3: Run pnpm test --filter auth. Report failures.
Each sub-task gets its own fresh turn budget and context window. A stop in task 2 does not lose task 1’s progress.
Step 4: Add explicit “do not stop until” assertions
In the system / task prompt:
Do not emit a "task complete" message until ALL of:
- All TypeScript errors resolved (pnpm tsc --noEmit returns 0)
- All tests in src/auth/ pass
- The plan list has zero remaining items
If any tool's stdout contains "done" or "complete", ignore it as a stop signal.
This neutralizes premature stop-sequence matches.
Step 5: Extend sandbox timeouts for long-running commands
If the stop is at a build / test / install:
codex agent run --shell-timeout 600 task.md
Or wrap the slow command:
( pnpm install 2>&1 | tee install.log ) &
PID=$!
while kill -0 $PID 2>/dev/null; do echo "still installing..."; sleep 20; done
wait $PID
The keepalive echo lines reset the idle timer.
Step 6: Pipe long output to a file, summarize inline
Large stdout floods (10k+ lines of test output) accelerate context truncation. Redirect and read summaries:
pnpm test > test.log 2>&1
tail -50 test.log
grep -E "FAIL|✗" test.log | head -20
The agent reads 70 lines instead of 10,000. The plan stays in window.
Verify
- Re-run the same task end-to-end and confirm it now completes without manual re-prompt.
- Check the transcript: turn count should be below your new cap with headroom.
- Run a deliberately longer task (e.g. add a second refactor) and confirm it still finishes — proves the cap fix was not just a coincidence.
Long-term prevention
- Default
--max-turnsto 100 for any non-trivial agent task; the cost difference is negligible compared to a wasted half-finished run. - Always split refactors into checkpointed sub-tasks of ≤10 steps each.
- Pipe verbose tool output to files; have the agent read tails / greps instead of full logs.
- Add a “do not stop until” assertion block to every agent task template.
- For builds/tests longer than 60s, set shell timeout explicitly and add keepalive echoes.
- Keep a
agent.logof every run so you can grep for429,permission denied,max_turnspost-mortem.
Common pitfalls
- Re-prompting “continue” without specifying the step number — the agent often restarts from step 1, redoing finished work.
- Raising
--max-turnsto 9999 and assuming it solves everything — context window overflow still bites at ~150-200 turns regardless of the cap. - Ignoring stdout that says “done” inside
package.jsonscript output — it WILL be matched as a stop signal in some configs. - Running a 30-minute build inside the agent loop instead of starting it in a separate worktree and polling.
- Forgetting that
pnpm installwith no terminal output for 90s gets killed even though it is making real progress.
FAQ
Q: Codex stops, I re-prompt “continue”, and it does the wrong step. Why?
The plan list was truncated out of context. Re-prompt with the exact step text: “Continue from: Refactor src/auth/login.ts to async/await.”
Q: I set max-turns to 200 but it still stops at turn 30.
Check the actual env var being read by your CLI. Some CLIs read CODEX_MAX_TURNS, others OPENAI_AGENT_MAX_TURNS. Run with --verbose and confirm the cap that was applied.
Q: Is there a way to detect “stopped early” automatically?
Yes — wrap the agent invocation. After exit, parse the transcript for "task complete" AND check that the plan list shows zero remaining. If “complete” without empty plan, re-prompt programmatically.
Q: Does using a long-context model variant fix this?
It helps with cause #4 (truncation), not causes #1 / #2 / #3 / #5 / #6. Long-context is necessary but not sufficient.
Related
- Codex Agent goes out-of-context on long repos
- Codex Agent spawns too many redundant tool calls
- Codex cannot finish patch
- Codex misses project conventions
- Codex environment setup fails
- Codex beginner guide
Tags: #Codex #agent #Troubleshooting #context-window #timeout