Codex Stops Mid-Task With No Error: Causes and Fixes

Q: Is there a way to detect "stopped early" automatically?

Yes. Run with `--json`, then after exit parse the events: if a "complete" message appears but the plan list still has open items, programmatically `codex exec resume --last` with a "continue the remaining plan" prompt.

Codex halts halfway through a multi-step task with no error — usually context compaction, a sandbox/approval denial, or a premature stop signal. Diagnose by the last tool call, then resume with codex resume --last.

Published: May 23, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Codex is running through a 12-step refactor. Around step 5 the agent stops. No error banner, no failed assertion, no traceback — just silence, or a “I’ve completed the task” summary that clearly is not true. You re-prompt and it picks up from a stale point, sometimes redoing work already done.

Fastest fix: resume the exact session instead of starting over. In the same directory run codex resume --last (interactive) or codex exec resume --last "Continue from step 6. Do not redo steps 1-5." (non-interactive). Resuming restores the original transcript, plan history, and approvals, so Codex keeps its prior context instead of guessing.

This is almost never a “bug” in Codex. The agent hit an invisible boundary: context-window compaction silently dropped the back half of the plan, a sandbox or approval denial killed a step in non-interactive mode, the loop matched a premature stop signal, or an upstream rate limit retried itself out. There is no per-task “turn cap” flag in the Codex CLI to raise — that is the most common wrong fix, so don’t reach for it. Diagnose by the last tool call before the stop.

Versions move fast. Figures below are current as of June 2026 (Codex CLI on GPT-5.5, default since 23 Apr 2026; GPT-5.4 as fallback). Run codex --version and check the Codex changelog if your behavior differs.

Which bucket are you in?

Symptom at the stop point	Most likely cause	Jump to
Long session, many file reads, plan “forgotten”	Context compaction dropped the plan	Step 1, Step 5
Last call was a write/install/network command in `codex exec`	Sandbox or approval denial (read-only default)	Step 3
Tool stdout contained “done” / “all tests pass” mid-run	Premature stop-signal match	Step 4
Last call was a long build/test with no output for 1-2 min	Command killed / timed out, treated as finished	Step 2
Log shows `429` / `rate_limit_exceeded`	Upstream rate-limit retry exhausted	Step 6
Stop at the same step every run	Deterministic — a specific command/approval; not capacity	Step 3

Common causes

Ordered by likelihood for typical mid-task halts.

1. Context-window compaction silently truncates the plan

This is the number-one cause on long tasks. Codex keeps the plan list and prior turns in the model context. When the session approaches the limit, Codex compacts — it calls a server-side endpoint that summarizes older turns into a compressed blob and discards the raw history. If the remaining plan steps were in the dropped region, the agent forgets them and stops once the visible plan is empty.

Two things make this worse as of June 2026: the Codex surface caps GPT-5.5 at 400K tokens (a throughput/cost decision, even though the GPT-5.5 API itself is 1M), and the effective working window before auto-compaction fires is lower still — sessions commonly report an effective window around 258K tokens, so you hit the compaction threshold well before the raw model limit suggests. On top of that, compaction under GPT-5.5 has been unreliable: community reports through mid-2026 describe /compact and remote-compaction operations timing out or dropping context instead of summarizing it (see Codex issues #19842 and #18829).

How to spot it: It was a long session (lots of file reads / big tool outputs). Re-prompt “what step are you on?” — if the answer references an earlier step than where it actually stopped, the plan was compacted away. Watch for a compaction/summary notice in the TUI just before the stop.

2. A slow command was killed and treated as “done”

If a build, test, or install runs long with no output, it can be killed (or hit a wrapper/CI timeout), and the agent treats the dead step as complete and moves to closing out.

How to spot it: The last tool call is a long-running shell command and no stdout appeared before the stop. Re-running the same command in a normal terminal takes more than a minute and prints nothing until the end.

3. Sandbox or approval denial in non-interactive mode

codex exec runs in a read-only sandbox by default. A write, install, or network command therefore needs approval — and in non-interactive mode there is no human to approve it, so the step is effectively denied. Codex records a failed tool call and abandons the broader task. In workspace-write, network access is still off by default and triggers an approval that exec cannot satisfy.

How to spot it: The stop is deterministic (same step every run), and that step writes files, installs packages, or reaches the network. Check the log for approval, requires approval, sandbox, read-only, or network near the stop.

4. Premature stop-signal match

The loop looks for “done” signals. If a tool’s stdout happens to contain “task complete”, “all tests pass”, or “no further action needed” mid-plan, the agent can conclude it is finished.

How to spot it: Look at the last tool output. A test runner that prints “all tests pass” for one suite, or a package.json script that echoes “done”, can short-circuit the loop while real work remains.

5. Upstream rate limit / 429 retry exhausted

A 429 from the model API triggers internal retries. After the retry budget is spent, the run ends with a truncated assistant turn that can look like a clean finish.

How to spot it: Look for 429, rate_limit_exceeded, or Retry-After / retrying in the run output. More common on free/low ChatGPT tiers and during provider incidents — cross-check the OpenAI status page.

Before you start

Note whether the stop is deterministic (same step every run = a specific command/approval/code path) or random (capacity, compaction, network).
Do not start a new session to “try again.” Save the situation by resuming (Step 1). A fresh codex exec "..." loses the plan and approvals.
Capture the last few tool calls. In automation, run with --json so you have a machine-readable event log to grep afterward.

Information to collect

The exact last tool call before the stop (read / write / shell / search).
Whether you were in interactive codex or non-interactive codex exec, and the --sandbox / --ask-for-approval values in effect.
The model (gpt-5.5, gpt-5.5-codex, gpt-5.4) and model_reasoning_effort. Confirm with codex --version.
Any stdout near “done”, “complete”, “passed” that could be mistaken for a stop signal.
Any 429, approval, sandbox, or compaction/summary notice near the stop.

Step-by-step fix

Ordered by ROI: cheapest checks first.

Step 1: Resume the exact session, don’t restart

The single most reliable rescue. From the same working directory:

# interactive: opens your most recent session with full plan + approvals
codex resume --last

# non-interactive: continue and steer in one shot
codex exec resume --last "You stopped at step 5. Continue from step 6. \
Do not redo steps 1-5. Print the remaining plan first, then execute."

codex resume without --last opens a picker of recent sessions; pass a SESSION_ID to target a specific run, and --all to include sessions from other directories. Resuming keeps the original transcript and plan history, so the agent does not re-derive (and re-do) finished work. If it resumes correctly, the root cause was a stop-signal match or a recoverable compaction — not a hard failure.

If the plan itself was compacted out, paste the exact step text rather than a bare step number:

Continue from this exact step: "Refactor src/auth/login.ts to async/await."

Step 2: Stop blocking the loop on slow commands

Don’t run a multi-minute build/test/install inside the agent loop. Kick it off, capture output to a file, and have the agent poll a tail instead of staring at a silent process:

( pnpm install 2>&1 | tee install.log ) &
PID=$!
while kill -0 $PID 2>/dev/null; do echo "still installing..."; sleep 20; done
wait $PID

The keepalive echo lines give the loop visible progress so a quiet step is not mistaken for a hung or finished one. For genuinely long builds, run them in a separate git worktree and have Codex poll the result, rather than holding the session open for 30 minutes.

Step 3: Give `exec` the sandbox and approvals the task actually needs

If the stop is deterministic on a write/install/network step, the read-only default is the cause. Grant exactly what the task needs and no more:

# allow workspace edits + routine local commands (network still off by default)
codex exec --sandbox workspace-write "Run the migration and update tests"

# non-interactive runs can't answer prompts — set the policy explicitly
codex exec --sandbox workspace-write --ask-for-approval never "..."

--ask-for-approval takes untrusted | on-request | never; in unattended runs, never (paired with a scoped sandbox) prevents a silent block on an approval the agent can never get. Avoid --dangerously-bypass-approvals-and-sandbox (alias --yolo) outside a throwaway container. The old --full-auto flag is deprecated — use --sandbox workspace-write instead. If the task needs the network, set it in config rather than disabling the sandbox:

# ~/.codex/config.toml
[sandbox_workspace_write]
network_access = true

Heads-up: on macOS the Seatbelt sandbox silently ignores this key, so a single-run override is more reliable there — codex exec --config sandbox_workspace_write.network_access=true "...". On Linux (Landlock) the config.toml setting is honored.

Step 4: Add an explicit “do not stop until” block

Neutralize premature stop-signal matches by stating the real exit condition in the prompt or AGENTS.md:

Do not report the task complete until ALL of:
- pnpm tsc --noEmit returns 0
- All tests in src/auth/ pass
- The plan list has zero remaining items

If any tool's stdout contains "done" or "complete", ignore it as a stop signal.

Step 5: Keep the plan out of the compaction blast radius

Two complementary moves shrink how fast you hit the 400K cap and keep the plan from being summarized away:

Pipe verbose output to a file, read only the tail. A 10k-line test log accelerates compaction. Redirect and skim:

pnpm test > test.log 2>&1
tail -50 test.log
grep -E "FAIL|✗" test.log | head -20

The agent reads ~70 lines instead of 10,000, so the plan stays in window.

Split the work into checkpointed sub-tasks. Even on a clean session, one mega-prompt is fragile. Each sub-task gets a fresh context budget, and a stop in task 2 does not lose task 1:

Task 1: Refactor src/auth/* to async/await. Stop and report.
Task 2: Update src/auth/*.test.ts to match. Stop and report.
Task 3: Run pnpm test --filter auth. Report failures.

If you want to keep one session, compact deliberately between sub-tasks with the /compact command in the TUI, rather than letting auto-compaction fire at an arbitrary point mid-plan. You can also make auto-compaction fire earlier (while there is still room to summarize cleanly) by lowering its threshold in config — set model_auto_compact_token_limit to roughly 60% of the effective window. Note the ceiling: values above 90% of the context window are silently ignored, so you cannot raise this to “never compact.”

# ~/.codex/config.toml
model_auto_compact_token_limit = 155000

Step 6: Rule out rate limits and version bugs

If the run output shows 429 / rate_limit_exceeded, you are throttled, not stuck — wait out the Retry-After, check the OpenAI status page, and retry with codex resume --last. Heavy unattended loops on a free/low ChatGPT tier hit this fastest.

Also confirm the model. The Codex default is GPT-5.5 with GPT-5.4 as fallback; a known mid-2026 bug resets the model to gpt-5.4 after /clear even when config.toml pins gpt-5.5. Pin it explicitly so a fallback doesn’t change behavior under you:

# ~/.codex/config.toml
model = "gpt-5.5-codex"
model_reasoning_effort = "high"

How to confirm it’s fixed

Re-run the same task end-to-end and confirm it completes without a manual resume.
For the compaction case: run a deliberately longer task (add a second refactor) and confirm it still finishes — that proves you addressed context pressure, not luck.
For the sandbox case: confirm the previously-failing write/install/network step now runs (visible in the transcript) instead of silently dropping.
Keep a --json event log of the run so you can grep 429, approval, or a compaction notice if it recurs.

Long-term prevention

Resume, don’t restart: make codex resume --last / codex exec resume --last your default reaction to any interrupted run.
Split non-trivial refactors into checkpointed sub-tasks of about 10 steps each; compact deliberately between them.
Pipe verbose tool output to files; have the agent read tails / greps instead of full logs.
Add a “do not stop until” block to every agent task template (or AGENTS.md).
For codex exec, set --sandbox and --ask-for-approval explicitly so an unattended run never blocks on an approval it can’t get.
Pin the model in config.toml so a silent fallback doesn’t change behavior.

Common pitfalls

Starting a fresh codex exec "..." to “retry” — it loses the plan and approvals; resume instead.
Hunting for a --max-turns flag — the Codex CLI has no per-task turn cap to raise; the real limit is context/compaction.
Re-prompting “continue” without the step text — after compaction the agent often restarts from step 1, redoing finished work.
Running codex exec against a write-heavy task on the read-only default and assuming Codex “gave up.”
Ignoring stdout that says “done” inside a package.json script — some configs match it as a stop signal.
Running a 30-minute build inside the agent loop instead of a separate worktree with polling.

FAQ

Q: Codex stops, I resume “continue”, and it does the wrong step. Why?

The plan list was compacted out of context, so “continue” has nothing to anchor to. Resume with the exact step text: codex exec resume --last "Continue from: Refactor src/auth/login.ts to async/await."

Q: How do I raise the turn limit so it doesn’t stop early?

There isn’t one to raise — the Codex CLI doesn’t expose a per-task turn cap (--max-turns / OPENAI_AGENT_MAX_TURNS are not real flags). What stops long runs is context compaction at the 400K Codex cap on GPT-5.5. Reduce context pressure (Step 5) and resume, instead of chasing a cap.

Q: My codex exec task dies at the same step every time.

That step almost certainly needs write/install/network access that the read-only exec default denies. Run it with --sandbox workspace-write and an explicit --ask-for-approval, or enable network_access in config.toml (Step 3).

Q: Is there a way to detect “stopped early” automatically?

Yes. Run with --json, then after exit parse the events: if a “complete” message appears but the plan list still has open items, programmatically codex exec resume --last with a “continue the remaining plan” prompt.

Q: Does a long-context model fix this?

It helps only with the compaction cause, and only partly. Codex caps GPT-5.5 at 400K on its surface regardless of the 1M API window, the effective working window before auto-compaction is lower still (sessions report around 258K as of June 2026), and compaction reliability — not raw size — has been the real bottleneck mid-2026. A bigger window does nothing for sandbox denials, stop-signal matches, or rate limits.

Q: I set network_access = true in config.toml but codex exec still can’t reach the network.

On macOS the Seatbelt sandbox silently ignores that key. Use a per-run override instead: codex exec --config sandbox_workspace_write.network_access=true "...". On Linux (Landlock) the config.toml setting is read correctly. Either way you also need a sandbox that permits it (--sandbox workspace-write), not the read-only default.

Tags: #Codex #agent #Troubleshooting #context-window #timeout