Codex Fails to Run or Interpret Build Results

Codex skips the build, misreads the output, or trusts a truncated tail. Use machine-readable verifiers + exit codes, not prose summaries.

Codex says “build succeeded ✓” — you pull the branch, run pnpm build, it explodes. Or Codex runs tests, summarizes “all tests passing,” but vitest actually reported 3 failures buried 400 lines up in the output. Either way, your trust in the report is now zero and you have to re-verify every claim manually.

This is an output-parsing problem, not a model-capability problem. Codex’s tool layer typically caps output at a few KB; long compiler dumps get truncated, and the agent reads the tail (which is usually fine) instead of the middle (where errors live). The fix: use machine-readable verifiers, exit-code checks, and split commands so each output fits in the window.

Common causes

Ordered by hit rate, highest first.

1. Tool output exceeded the harness cap and was truncated

Most Codex harnesses cap command output at 8KB–64KB. A failing pnpm build in a monorepo dumps 200KB. The middle (which has the actual error) drops; the tail (showing “Done building in 12s” or similar) survives. Codex reads the tail, concludes success.

How to spot it: Re-run the same command yourself with no truncation. If the local output has obvious errors that aren’t in what Codex saw, output truncation is the cause.

2. Codex summarized output and lost the specific error

Even with full output, Codex sometimes hits its own context limit and re-summarizes. The summary says “compilation failed in user-service” but drops the file:line and exact message. Codex’s next turn works from the summary, not the real error.

How to spot it: Ask Codex to paste the verbatim failing line. If it can’t, it lost the detail in a summary.

3. Colored output / ANSI escape codes confused the parser

vitest, tsc, eslint all emit ANSI color codes by default. The agent’s tokenizer treats \x1b[31m as garbage and may discard surrounding text. Errors get mangled.

How to spot it: Run with --no-color or FORCE_COLOR=0. If Codex now sees the same errors that mystified it before, colors were the issue.

4. Codex took the first error and ignored cascading ones

tsc returns 47 errors; the first is “cannot find module foo” caused by a missing install. Codex fixes the install but skips the other 46 (which were just downstream of the same root cause). Next iteration finds 46 errors, looks like regress, ping-pongs.

How to spot it: Errors come back as a multiple of the count Codex “fixed.” Root-cause grouping is missing.

5. Sub-process failures didn’t propagate

npm run build shells out to webpack, which shells out to a worker, which fails. The outer command exits 0 because the failure happened in a child process whose stderr was swallowed. Codex sees green.

How to spot it: Outer command exits 0 but artifacts are missing or stale. Look in the script chain for shell pipes without set -o pipefail.

6. Codex didn’t run the command at all

The agent generated code, said “verified with pnpm typecheck,” but in the chat there’s no tool-use entry for that command. It hallucinated the verification.

How to spot it: Search the chat transcript for the actual tool call. If absent, Codex skipped verification entirely.

Shortest path to fix

Ordered by ROI. Step 1 alone fixes 60% of “Codex misread output” cases.

Step 1: Use commands that produce short, structured output

Replace pnpm build with verifiers that print less:

# Bad: 500 lines of noise, real error in the middle
pnpm build

# Good: only errors, sorted, deduplicated
pnpm typecheck --pretty false 2>&1 | grep "error TS" | sort -u
echo "Exit: $?"

For tests:

# JSON reporter — machine-readable, Codex can parse exit code + counts
pnpm vitest run --reporter=json --silent 2>&1 | jq '{passed, failed, errors: [.testResults[] | select(.status=="failed") | .name]}'

For lint:

pnpm eslint . --format=compact --max-warnings 0 2>&1 | tail -30

Each command produces under 100 lines on a clean repo, under 300 lines on a broken one. Both fit in the harness window.

Step 2: Bind verification to exit code, not prose

In the prompt:

Run the verifier and report:
1. The exit code (use `echo "Exit: $?"` immediately after).
2. The number of errors / failing tests.
3. The first 3 errors verbatim (file:line + message).

NEVER say "build passed" without printing exit code 0.
NEVER summarize "looks good" — paste the exact output.

Exit code is the source of truth; prose summaries are lossy.

Step 3: Strip color and disable interactivity

Always add to verifier commands:

FORCE_COLOR=0 CI=true NO_COLOR=1 pnpm test -- --no-color

CI=true also disables interactive prompts (watch mode, progress bars) that confuse non-TTY parsers.

Step 4: Split verification into one-concern commands

Don’t run pnpm build if it does typecheck + lint + bundle + minify. Each step has different failure modes; one mixed output stream hides the cause. Instead:

pnpm typecheck   # exit code 0 / 1
pnpm lint        # exit code 0 / 1
pnpm test        # exit code 0 / 1
pnpm build       # exit code 0 / 1, only if above three pass

Codex reads each cleanly and can attribute failures.

Step 5: For long output, pipe to a file and grep

If a verifier must produce long output, capture it and show Codex only the relevant slice:

pnpm build 2>&1 > /tmp/build.log
echo "Exit: $?"
# Show only error lines and 2 lines of context
grep -B1 -A2 -E "error|Error|ERROR|✗|FAIL" /tmp/build.log | head -50

Codex gets a focused view but the full log is still available for follow-up grep.

Step 6: Force Codex to actually run the verifier

In strict mode:

After every code change, you MUST run these commands in this order:
1. `pnpm typecheck && echo "TYPECHECK OK" || echo "TYPECHECK FAIL"`
2. `pnpm test -- --run && echo "TEST OK" || echo "TEST FAIL"`
3. `pnpm lint --max-warnings 0 && echo "LINT OK" || echo "LINT FAIL"`

Paste the OK/FAIL line for each. Do not skip any.
If any FAIL, fix and re-run. Don't say "done" without three OKs.

The OK/FAIL token is a deterministic marker Codex can’t fudge.

Prevention

  • Standardize on JSON / compact reporters for tests + lint — never raw human-readable mode
  • Always strip ANSI color in commands Codex runs: FORCE_COLOR=0 CI=true
  • Split verification into typecheck / lint / test / build, each producing under 100 lines
  • Require exit-code-based verification in every prompt — prose summaries are not trusted
  • Use set -o pipefail in any shell script Codex executes so sub-process failures propagate
  • Keep CI as the final source of truth — Codex’s local verifier is fast feedback, CI is the gate

Tags: #Codex #Coding agent #Troubleshooting #Debug #Build results