Codex Agent Stops Mid-Task Without Error

Codex halts halfway through a multi-step task with no visible failure — usually a context window, sandbox timeout, or stop-condition issue. Diagnose by checking the last tool call and turn budget.

Codex Agent is running through a 12-step refactor. Around step 5 the agent stops responding. No error banner, no failed assertion, no traceback — just silence and a message like “task complete” that clearly is not true. You re-prompt and it picks up from a stale point, sometimes redoing work already done. This is almost never a “bug” in Codex — it is the agent hitting an invisible boundary: a turn limit, a sandbox idle timeout, an internal stop condition matching prematurely, or a context window overflow that quietly truncated the plan.

Common causes

Ordered by likelihood for typical mid-task halts.

1. Turn budget exhausted

Codex Agent runs with a hard cap on tool-call turns per task (commonly 25-50). Long refactors burn turns on reads, edits, lint, retests. When the budget is hit the agent prints a summary and stops, even if the plan is half done.

How to spot it: Count tool calls in the transcript. If it is close to a round number (25, 50, 100), you hit the cap.

2. Sandbox idle timeout

If a build, test, or install command takes longer than the sandbox idle timeout (often 60-120s with no stdout), the process gets killed, the agent treats it as “done”, and moves on to closing out.

How to spot it: The last tool call is a long-running shell command and no stdout appeared before the stop. Re-running the same command locally takes more than 60s.

3. Premature stop-sequence match

The agent looks for explicit “done” signals — phrases like “task complete”, “all tests pass”, “no further action needed”. If a tool’s stdout happens to contain that phrase mid-plan, the agent thinks it is done.

How to spot it: Look at the last tool output. A test runner that prints “all tests pass” partway through, or a script that echoes “done”, can short-circuit the loop.

4. Context window overflow silently truncates the plan

Codex’s plan list lives in the system / assistant context. As file reads accumulate, older turns get auto-summarized or dropped. The remaining steps fall off, the agent forgets them, and stops once the visible plan is empty.

How to spot it: Re-prompt with “what step are you on?” — if the answer references a step earlier than where it actually stopped, the plan was truncated.

5. Hidden permission prompt waiting offscreen

A sandbox-write or network-out tool call may produce an interactive permission request. In headless modes the prompt is silently denied; the agent records a “tool failed” and gives up the broader task.

How to spot it: Check the agent log for permission denied, requires approval, or non-interactive near the stop point.

6. Upstream rate limit / 429 retry exhausted

A 429 from the underlying model API triggers internal retries. After N retries the agent surrenders quietly, since the user-facing message is just a truncated assistant turn.

How to spot it: Look for 429, rate_limit_exceeded, or retrying in Ns in the agent telemetry / log file.

Before you start

  • Note whether the stop happens at the same step every time or at random points; deterministic = code path, random = capacity / network.
  • Save the full transcript before re-prompting — once you re-prompt, the prior tool-call history may get summarized away.
  • If you have a budget setting (--max-turns, OPENAI_AGENT_MAX_TURNS), record its current value.

Information to collect

  • Exact last tool call before the stop (read / write / shell / search).
  • Approximate turn count: count tool calls in the transcript.
  • The model in use (gpt-5.5, gpt-5.4, etc.) and whether the session is using a long-context variant.
  • Any stdout text near “done”, “complete”, “passed” that could be mistaken for a stop signal.
  • Sandbox runtime + idle timeout values from your config.

Step-by-step fix

Ordered by ROI: cheapest checks first.

Step 1: Re-prompt with “continue the plan” and a step pointer

Most reliable rescue:

You stopped at step 5 of the plan. Continue from step 6.
Do not redo steps 1-5. Print the remaining plan first, then execute.

If the agent immediately resumes correctly, the root cause was a stop-sequence match or truncated plan, not a hard limit.

Step 2: Raise the turn budget

In CLI / API:

codex agent run --max-turns 100 task.md

Or in environment:

export OPENAI_AGENT_MAX_TURNS=150

Long refactors realistically need 60-120 turns. Setting the cap to 30 because “it usually fits” is the most common preventable cause.

Step 3: Break the task into checkpointed sub-tasks

Even with raised limits, one mega-prompt is fragile. Split:

Task 1: Refactor src/auth/* to async/await. Stop and report.
Task 2: Update src/auth/*.test.ts to match. Stop and report.
Task 3: Run pnpm test --filter auth. Report failures.

Each sub-task gets its own fresh turn budget and context window. A stop in task 2 does not lose task 1’s progress.

Step 4: Add explicit “do not stop until” assertions

In the system / task prompt:

Do not emit a "task complete" message until ALL of:
- All TypeScript errors resolved (pnpm tsc --noEmit returns 0)
- All tests in src/auth/ pass
- The plan list has zero remaining items

If any tool's stdout contains "done" or "complete", ignore it as a stop signal.

This neutralizes premature stop-sequence matches.

Step 5: Extend sandbox timeouts for long-running commands

If the stop is at a build / test / install:

codex agent run --shell-timeout 600 task.md

Or wrap the slow command:

( pnpm install 2>&1 | tee install.log ) &
PID=$!
while kill -0 $PID 2>/dev/null; do echo "still installing..."; sleep 20; done
wait $PID

The keepalive echo lines reset the idle timer.

Step 6: Pipe long output to a file, summarize inline

Large stdout floods (10k+ lines of test output) accelerate context truncation. Redirect and read summaries:

pnpm test > test.log 2>&1
tail -50 test.log
grep -E "FAIL|✗" test.log | head -20

The agent reads 70 lines instead of 10,000. The plan stays in window.

Verify

  • Re-run the same task end-to-end and confirm it now completes without manual re-prompt.
  • Check the transcript: turn count should be below your new cap with headroom.
  • Run a deliberately longer task (e.g. add a second refactor) and confirm it still finishes — proves the cap fix was not just a coincidence.

Long-term prevention

  • Default --max-turns to 100 for any non-trivial agent task; the cost difference is negligible compared to a wasted half-finished run.
  • Always split refactors into checkpointed sub-tasks of ≤10 steps each.
  • Pipe verbose tool output to files; have the agent read tails / greps instead of full logs.
  • Add a “do not stop until” assertion block to every agent task template.
  • For builds/tests longer than 60s, set shell timeout explicitly and add keepalive echoes.
  • Keep a agent.log of every run so you can grep for 429, permission denied, max_turns post-mortem.

Common pitfalls

  • Re-prompting “continue” without specifying the step number — the agent often restarts from step 1, redoing finished work.
  • Raising --max-turns to 9999 and assuming it solves everything — context window overflow still bites at ~150-200 turns regardless of the cap.
  • Ignoring stdout that says “done” inside package.json script output — it WILL be matched as a stop signal in some configs.
  • Running a 30-minute build inside the agent loop instead of starting it in a separate worktree and polling.
  • Forgetting that pnpm install with no terminal output for 90s gets killed even though it is making real progress.

FAQ

Q: Codex stops, I re-prompt “continue”, and it does the wrong step. Why?

The plan list was truncated out of context. Re-prompt with the exact step text: “Continue from: Refactor src/auth/login.ts to async/await.”

Q: I set max-turns to 200 but it still stops at turn 30.

Check the actual env var being read by your CLI. Some CLIs read CODEX_MAX_TURNS, others OPENAI_AGENT_MAX_TURNS. Run with --verbose and confirm the cap that was applied.

Q: Is there a way to detect “stopped early” automatically?

Yes — wrap the agent invocation. After exit, parse the transcript for "task complete" AND check that the plan list shows zero remaining. If “complete” without empty plan, re-prompt programmatically.

Q: Does using a long-context model variant fix this?

It helps with cause #4 (truncation), not causes #1 / #2 / #3 / #5 / #6. Long-context is necessary but not sufficient.

Tags: #Codex #agent #Troubleshooting #context-window #timeout