Codex Fails to Run or Interpret Build Results

Codex skips the build, misreads it, or trusts a truncated tail. Fix it with exit-code checks, machine-readable verifiers, and a Stop hook gate — not prose summaries.

Published: May 17, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Codex says “build succeeded ✓” — you pull the branch, run pnpm build, it explodes. Or Codex runs tests, summarizes “all tests passing,” but vitest actually reported 3 failures buried 400 lines up in the output. Either way, your trust in the report is now zero and you re-verify every claim by hand.

Fastest fix: stop letting Codex read raw build logs. Make every verifier print a short, machine-readable result, bind the verdict to the exit code, and (if you’re on a recent Codex CLI) add a Stop hook that re-runs your tests and blocks the turn from ending when they fail. The rest of this guide explains why each one matters.

This is an output-parsing problem, not a model-capability problem. The Codex CLI caps each tool call’s output before it reaches the model — currently a hardcoded 10 KiB or 256 lines (whichever hits first), with the head and tail kept and the middle dropped and replaced by a truncation marker (openai/codex #5913, still hardcoded as of June 2026). A failing pnpm build in a monorepo dumps far more than that. The errors usually live in the middle; the surviving tail says “Done building in 12s.” Codex reads the tail and calls it a win.

Which bucket are you in?

Symptom you see	Most likely cause	Jump to
Local re-run shows errors Codex never mentioned	Output truncated past the 10 KiB / 256-line cap	Cause 1, Step 1
Codex names a vague failure but can’t quote the line	It re-summarized and lost `file:line`	Cause 2, Step 2
Garbled/missing error text, weird characters	ANSI color codes mangled the stream	Cause 3, Step 3
Same error count keeps reappearing	Only the first (root-cause) error was fixed	Cause 4, Step 4
Exit 0 but artifacts missing/stale	Subprocess failure didn’t propagate	Cause 5, Prevention
No tool-call entry for the “verification” it claims	Codex never ran the command	Cause 6, Step 6

Common causes

Ordered by hit rate, highest first.

1. Tool output exceeded the harness cap and was truncated

The Codex CLI truncates each command’s output at 10 KiB or 256 lines before the model sees it, keeping the head and tail and dropping the middle. A failing pnpm build in a monorepo dumps 200KB+. The middle (the actual error) is gone; the tail (“Done building in 12s”) survives, so Codex concludes success. This cap is hardcoded and not yet configurable via config.toml as of June 2026.

How to spot it: Re-run the same command yourself with no truncation. If your local output has obvious errors that aren’t in what Codex saw, output truncation is the cause.

2. Codex summarized output and lost the specific error

Even with full output, Codex sometimes hits its own context limit and re-summarizes. The summary says “compilation failed in user-service” but drops the file:line and exact message. Codex’s next turn works from the summary, not the real error.

How to spot it: Ask Codex to paste the verbatim failing line. If it can’t, it lost the detail in a summary.

3. Colored output / ANSI escape codes confused the parser

vitest, tsc, and eslint all emit ANSI color codes by default when they detect a TTY. Inside a captured stream those \x1b[31m sequences become noise that can swallow the surrounding text, so error lines get mangled.

How to spot it: Run with --no-color or FORCE_COLOR=0. If Codex now sees the same errors that mystified it before, colors were the issue.

4. Codex took the first error and ignored cascading ones

tsc returns 47 errors; the first is “cannot find module foo” caused by a missing install. Codex fixes the install but skips the other 46 (which were just downstream of the same root cause). The next iteration finds 46 errors, looks like a regression, and Codex ping-pongs.

How to spot it: Errors come back as a multiple of the count Codex “fixed.” Root-cause grouping is missing.

5. Sub-process failures didn’t propagate

npm run build shells out to webpack, which shells out to a worker, which fails. The outer command exits 0 because the failure happened in a child process whose stderr was swallowed, often through an unguarded pipe. Codex sees green.

How to spot it: Outer command exits 0 but artifacts are missing or stale. Look in the script chain for shell pipes without set -o pipefail.

6. Codex didn’t run the command at all

The agent generated code, said “verified with pnpm typecheck,” but in the transcript there’s no tool-use entry for that command. It hallucinated the verification.

How to spot it: Search the session transcript for the actual tool call. If it’s absent, Codex skipped verification entirely.

Shortest path to fix

Ordered by ROI. Step 1 alone fixes most “Codex misread output” cases; Step 7 (the Stop hook) is the one that makes a green report impossible to fake.

Step 1: Use commands that produce short, structured output

Replace pnpm build with verifiers that print less:

# Bad: 500 lines of noise, real error in the middle (which gets truncated)
pnpm build

# Good: only errors, sorted, deduplicated
pnpm tsc --noEmit --pretty false 2>&1 | grep "error TS" | sort -u
echo "Exit: ${PIPESTATUS[0]}"

(${PIPESTATUS[0]} reports tsc’s exit code, not grep’s — $? after a pipe gives you the last command, which is the wrong one here.)

For tests, use Vitest’s JSON reporter and extract real fields with jq:

# Real Vitest JSON keys: numPassedTests / numFailedTests / success / testResults[].assertionResults[]
pnpm vitest run --reporter=json 2>/dev/null \
  | jq '{passed: .numPassedTests, failed: .numFailedTests, ok: .success,
         failures: [.testResults[].assertionResults[]
                    | select(.status=="failed") | .fullName]}'

For lint:

pnpm eslint . --format=compact --max-warnings 0 2>&1 | tail -30

Each command produces well under 256 lines on a clean repo and stays close to it even when broken — so it survives the truncation cap intact.

Step 2: Bind verification to exit code, not prose

In the prompt:

Run the verifier and report:
1. The exit code (use `echo "Exit: $?"` immediately after, or `${PIPESTATUS[0]}` after a pipe).
2. The number of errors / failing tests.
3. The first 3 errors verbatim (file:line + message).

NEVER say "build passed" without printing exit code 0.
NEVER summarize "looks good" — paste the exact output.

Exit code is the source of truth; prose summaries are lossy.

Step 3: Strip color and disable interactivity

Always add to verifier commands:

FORCE_COLOR=0 NO_COLOR=1 CI=true pnpm test -- --no-color

CI=true also disables interactive prompts (watch mode, progress bars) that confuse non-TTY parsers. Many tools auto-disable color when they detect a non-TTY pipe, but setting FORCE_COLOR=0 and NO_COLOR=1 makes it deterministic.

Step 4: Split verification into one-concern commands

Don’t run pnpm build if it does typecheck + lint + bundle + minify. Each step has different failure modes; one mixed output stream hides the cause. Instead:

pnpm typecheck   # exit code 0 / 1
pnpm lint        # exit code 0 / 1
pnpm test        # exit code 0 / 1
pnpm build       # exit code 0 / 1, only if the above three pass

Codex reads each cleanly and can attribute failures. This also fixes Cause 4: running typecheck alone surfaces the whole tsc error set in one place so Codex can root-cause instead of fixing one error at a time.

Step 5: For long output, pipe to a file and grep

If a verifier must produce long output, capture the whole thing to disk and show Codex only the relevant slice:

pnpm build > /tmp/build.log 2>&1
echo "Exit: $?"
# Show only error lines and a little context
grep -nE -B1 -A2 "error|Error|ERROR|✗|FAIL" /tmp/build.log | head -50

Codex gets a focused view that fits the window, and the full log is still on disk for a follow-up grep if it needs more.

Step 6: Force Codex to actually run the verifier

Put the gate in AGENTS.md (Codex’s per-project instruction file, loaded at the start of every session) so it applies to every turn, not just the prompt you remember to type:

## Verification (required after every code change)
Run these in order and paste the OK/FAIL line for each:
1. `pnpm typecheck && echo "TYPECHECK OK" || echo "TYPECHECK FAIL"`
2. `pnpm test -- --run && echo "TEST OK" || echo "TEST FAIL"`
3. `pnpm lint --max-warnings 0 && echo "LINT OK" || echo "LINT FAIL"`

Do not say "done" without three OKs. If any FAIL, fix and re-run.

The OK/FAIL token is a deterministic marker Codex can’t fudge — and AGENTS.md is the right home for it because Codex re-reads that file every session.

Step 7: Add a Stop hook so a fake green can’t end the turn

This is the strongest fix. The Codex CLI hooks engine (enabled by default; stable since v0.124.0, released April 23 2026) lets you run a shell command when an event fires. A Stop hook fires when a turn is about to finish — the perfect place to re-run your tests and refuse to let Codex stop while they’re red.

Define it in ~/.codex/hooks.json (or per-repo .codex/hooks.json):

{
  "hooks": {
    "Stop": [
      {
        "command": ["bash", "-c", "pnpm vitest run --reporter=dot >/tmp/verify.log 2>&1 || echo '{\"decision\":\"block\",\"reason\":\"Tests are still failing — see /tmp/verify.log and fix before finishing.\"}'"]
      }
    ]
  }
}

If the tests fail, the hook emits {"decision":"block","reason":"..."} and Codex is forced to keep working instead of declaring victory. Hooks are configured in ~/.codex/hooks.json, .codex/hooks.json, or inline [hooks] tables in config.toml; to turn the engine off entirely, set [features] hooks = false in config.toml. See the Codex hooks docs for the full event list (PreToolUse, PostToolUse, Stop, and more).

How to confirm it’s fixed

Deliberately break the build (e.g. add a type error), then ask Codex to verify.
It should report a non-zero exit code and quote the exact failing line — not “looks good.”
With the Stop hook in place, Codex should refuse to end the turn and instead keep fixing until the verifier is green.
Pull the branch and run the same verifier yourself. Your exit code and Codex’s should match. If they diverge, you still have a truncation or color problem upstream.

Prevention

Standardize on JSON / compact reporters for tests and lint — never raw human-readable mode.
Always strip ANSI color in commands Codex runs: FORCE_COLOR=0 NO_COLOR=1 CI=true.
Split verification into typecheck / lint / test / build, each producing well under the 256-line cap.
Require exit-code-based verification in AGENTS.md, not just the chat prompt — prose summaries aren’t trusted.
Use set -o pipefail in any shell script Codex executes so sub-process failures propagate (${PIPESTATUS[0]} for one-off pipes).
Add a Stop hook that re-runs tests so a fabricated “all green” can’t end the turn.
Keep CI as the final source of truth — Codex’s local verifier is fast feedback, CI is the gate.

FAQ

Why does Codex see fewer errors than I do locally? The Codex CLI truncates each command’s output at 10 KiB or 256 lines, keeping the head and tail and dropping the middle. Long build logs put the real errors in the part that gets cut. Shorten the output (Step 1) or pipe to a file and grep the relevant slice (Step 5).

Can I just raise the truncation limit? Not as of June 2026 — the cap is hardcoded and there’s an open request to make it configurable (openai/codex #5913). Reducing output volume is the reliable fix, not waiting on a config flag.

Codex says tests pass but vitest reports failures. Why? Either the failures scrolled into the truncated middle, or the colored summary got mangled. Switch to --reporter=json and have Codex read numFailedTests and success instead of the human-readable summary (Step 1).

The outer command exits 0 but the build is broken. What now? A child process failed and its non-zero status didn’t propagate through a pipe. Add set -o pipefail to the script, or check each sub-step’s exit code separately (Step 4). Don’t trust a 0 from a command that pipes through tee, cat, or a wrapper script.

How do I stop Codex from claiming it ran a verifier it never ran? Search the session transcript for the tool call; if it’s missing, the “verification” was hallucinated. Move the verifier into AGENTS.md (Step 6) and back it with a Stop hook (Step 7) so completion is gated on real, observable command output.

Do hooks work on Windows? The hooks engine ran experimentally before v0.124.0 and isn’t available on Windows in the same way as on macOS/Linux. If you’re on Windows, lean on AGENTS.md plus exit-code prompting and let CI be the hard gate.

Tags: #Codex #Coding agent #Troubleshooting #Debug #Build results