Codex Agent Spawns Too Many Redundant Tool Calls

Codex re-reads the same file 8 times and re-greps the same query 5 times. Fix by pre-feeding context, requiring a plan first, and capping reads via tool restrictions.

You ask Codex Agent to add one function. The transcript shows 47 tool calls before it writes a single line. Read package.json four times. Glob "**/*.ts" three times with the same pattern. Read src/types.ts eight times — twice in a row. By the time it gets to writing code, half the turn budget is spent and the relevant context has been pushed out of the window. The output, when it finally arrives, is shaped wrong because the agent never settled into the real problem.

Redundant tool calls are not a model defect. They are a sign that the agent is searching, not building. It has no map, no plan, no recently-read cache, so each tool call is a guess. The cure is to give the agent the map up front, require a written plan before any edit, and structure the prompt so the agent does not re-discover what you already know.

Common causes

Ordered by hit rate.

1. No structural map provided — agent re-discovers structure

You said “add a new endpoint”. Agent does not know where endpoints live. It reads package.json, tsconfig.json, scans src/, reads index.ts, then app.ts, then server.ts. Five reads to discover what one sentence could have told it.

How to spot it: First 5+ tool calls are exploration, not action. Reads have no edit afterward.

2. No plan written before action

Agent jumps into editing, hits an unknown, reads more, edits more, hits another unknown. Each unknown triggers a new read. Without a written plan, every decision becomes lookup-on-demand.

How to spot it: Tool calls interleave Read → Write → Read → Write rather than concentrated Read phase → concentrated Write phase.

3. Search results not summarized

Agent runs grep -rn "useAuth" src/ and gets 80 matches. Instead of summarizing or filtering, it reads each match’s file in full. Same query 5 minutes later because it forgot the summary.

How to spot it: After a grep with N matches, you see N+ file reads. Agent did not narrow before reading.

4. No in-context “what I have read” cache

Agent forgets it already read a file. Re-prompt 10 turns later about the same code path → re-read. No structural pointer “we already covered file X”.

How to spot it: Same Read <path> shows up 3+ times across the transcript.

5. Tool calls used as a thinking aid

For some prompts the agent prefers running ls, cat, wc as a substitute for reasoning. Like “checking” rather than committing. This balloons the turn count without progress.

How to spot it: Many cheap, single-file head, ls, wc calls with no edits between them.

6. Verbose stdout drives re-reads

A tool call returns 5000 lines. Agent cannot hold that in working memory, partial info gets summarized, then it re-reads parts of the same file to “verify”.

How to spot it: A noisy tool’s output is immediately followed by re-reading the same path it just covered.

Before you start

  • Get a baseline count of tool calls for a typical task — grep -c "tool_use" agent.log or similar.
  • Identify the agent template’s exploration phase length; long exploration = symptom.
  • Decide whether the task is “I do not know the repo” or “I know but I am letting Codex re-derive” — they have different fixes.

Information to collect

  • Full transcript of the offending run with tool call counts per type.
  • The first 10 tool calls — these usually expose whether the agent has a map or is searching.
  • The task prompt’s exact wording — vague prompts cause more searching.
  • Any AGENTS.md / repo-map.md / convention doc the agent could have used but did not load.

Step-by-step fix

Ordered by ROI.

Step 1: Pre-feed the map and key file pointers

Top of every task prompt:

Repo structure:
- API routes: src/api/routes/*.ts (one file per resource)
- Services: src/services/*.ts (business logic)
- Types: src/types/index.ts (shared) + src/types/<feature>.ts (feature-specific)
- Tests: alongside source as *.test.ts

For this task you will edit:
- src/api/routes/orders.ts (new)
- src/services/orderService.ts (extend)
- src/types/order.ts (extend)
- src/api/routes/orders.test.ts (new)

The agent now has the structural map without one read. Drops 5-15 redundant exploration calls.

Step 2: Require a written plan before any edit

Plan first. Output:

PLAN:
1. <step>
2. <step>
...

For each step, list the files involved. Confirm the plan matches the working scope. Only then begin editing.

If you find the plan needs revision mid-execution, output a REVISED PLAN before continuing.

Forces the agent to do exploration in a structured phase, then commit to action. Cuts re-reads dramatically.

After any grep / glob / search with > 5 results, output a SUMMARY:

SUMMARY of "<query>":
- N matches across M files
- Relevant files: <list 1-5>
- Skipping: <list>

Then read only the relevant files. Do NOT read all matches.

A grep summary plus 3 reads beats 80 reads from raw matches.

Step 4: Maintain an explicit read-tracker in the prompt

After each Read, append to your scratchpad:

READ_TRACKER:
- src/types/order.ts (read at step 2)
- src/services/orderService.ts (read at step 3)

Before any Read, check the tracker. If the file is listed, recall from memory; do not re-read unless contents changed.

This externalizes the “what I have already read” cache. The agent now has to acknowledge re-reads explicitly.

Step 5: Restrict tools per task phase

If your runner supports per-phase tool restrictions:

Phase 1 (exploration): allow Read, Grep, Glob. Disallow Write/Edit.
Phase 2 (planning): allow no tools. Output text only.
Phase 3 (execution): allow Edit, Write, Bash. Disallow Grep (already done).

Removing tools mid-task prevents the “let me just check one more thing” loop.

Step 6: Cap noisy tool output

Wrap commands that flood:

pnpm test 2>&1 | tee /tmp/test.log | grep -E "FAIL|PASS|Tests:" | head -50

Less noise → less re-read-to-verify. Many redundant reads chase missed signal in a wall of stdout.

Step 7: Use a model with better tool-use efficiency

If the agent still wastes calls after structural fixes, try a model variant tuned for tool use (e.g. gpt-5.5 over gpt-5.4 for agentic flows). Difference is often 30-50% fewer redundant calls on equivalent tasks.

Verify

  • Count tool calls in the next run — should drop 40-70% with steps 1-3 alone.
  • Check the transcript: clear Read phase → Plan output → Write phase, not interleaved chaos.
  • No repeated Read <same path> across the run.
  • Total turns to completion drops; finishes within turn budget with headroom.

Long-term prevention

  • Every agent task template starts with a “repo map + relevant files” block.
  • Plan-first is mandatory; tasks without a PLAN section get rejected by a verifier.
  • AGENTS.md lists the canonical conventions for “where do X live” — the agent reads it instead of re-discovering each session.
  • Wrap noisy tools with output caps as the default; the agent never sees 5k lines of stdout.
  • Per-phase tool restrictions on long jobs.
  • Keep an agent-runs.log and review weekly: any run with > 60 tool calls gets root-caused.

Common pitfalls

  • Treating “more tool calls = more thorough” as good. Each call costs context and turns; redundancy is pure cost.
  • Writing “do not re-read files” in the prompt without giving the agent a tracker mechanism. The instruction has nowhere to ground itself.
  • Letting the agent run find . -type f on a large repo. The output alone wrecks the budget.
  • Setting turn budget to 200 to “absorb” redundancy. The redundancy still drops output quality even if it fits.
  • Forgetting that prompt cache makes the redundancy cheaper in dollars but still costly in window.

FAQ

Q: My agent reads the same file twice in adjacent turns. Why?

Likely your runner does not surface a previous-tool-call cache, and the agent’s plan does not reference the read. Add the explicit READ_TRACKER scratchpad and the issue goes away.

Q: How do I count tool calls programmatically?

Most agent runners emit a JSON event stream. Count records of type tool_use. Also: grep -c "function_calls" transcript.txt for plain-text logs.

Q: Is there a fixed ratio of “reads to edits” that is healthy?

Rough rule: 2-4 reads per edit on familiar code, 5-8 on unfamiliar. If you are seeing 15+ reads per edit, the prompt is missing a structural map.

Q: Can I just lower max-turns to force efficiency?

Lowering max-turns punishes legitimate work too. Better to fix the cause (no map, no plan) — efficiency rises and headroom grows simultaneously.

Tags: #Codex #agent #Troubleshooting #tool-calls #efficiency