You point Codex Agent at a 500k-line monorepo and ask it to “add a new API endpoint, wire it through service + DB, and add a test.” Two minutes in, it has read 40 random files, the plan list has been auto-summarized twice, and now it is editing the wrong package — it forgot which app’s routing layer it was supposed to touch. The output kind of works but kind of does not, and you cannot tell where it went off the rails. On small repos Codex feels almost magical; on long ones it gets lost.
The root cause is rarely “the model is dumb on big code”. It is that the working set the agent built (file reads + plan + AGENTS rules + reasoning) outgrew the context window, and the auto-summarizer dropped the structural anchors. The fix is to bound the working set before the agent starts: scope to a sub-tree, pre-feed structural summaries, and force-load the relevant AGENTS.md as a pinned section.
Common causes
Ordered by how often they cause loss-of-context on big repos.
1. No scope — agent scans the whole repo
You said “find where authentication is wired”. The agent runs ripgrep across the entire monorepo, gets 4000 matches, reads 50 random files, and the original task scrolls out of context.
How to spot it: The transcript shows reads under unrelated packages (e.g. marketing-site/ while editing api-server/).
2. Plan list auto-summarized away
Once context fills past ~70%, the runner condenses early turns. The original 12-step plan becomes “the user wants to add an endpoint” and the agent forgets steps 6-12.
How to spot it: Re-prompt “list remaining plan items” — answer has fewer items than you wrote, or items are vaguer than the original.
3. AGENTS.md / conventions fell out of window
AGENTS.md was loaded at turn 1 but by turn 30 it has been summarized into “follow conventions”. The agent now generates with no specific conventions in memory.
How to spot it: Output violates a convention that AGENTS.md spells out clearly. Re-prompt “quote the relevant AGENTS.md rule” — if the agent paraphrases or invents, the rule is no longer in context.
4. Verbose tool output floods context
A pnpm tsc --noEmit dump of 4000 type errors, a 30-file directory listing with full file contents, a 10k-line test log — each chews through window in one turn.
How to spot it: One tool call accounts for > 30% of total tokens. Run wc -l on the suspect tool’s stdout post-hoc.
5. Agent re-reads the same file multiple times
Without an in-context “what I have already read” cache, the agent re-issues reads for files it already consumed. Each re-read costs window with zero new info.
How to spot it: Search transcript for duplicated Read <path> calls. Three or more reads of the same path = significant waste.
6. Cross-package edits without a dependency map
The task touches 3 packages, but the agent does not know the dependency graph. It re-discovers it via repeated reads of package.json, tsconfig.json, *.lock — each discovery eats window.
How to spot it: Transcript contains many Read package.json and Read tsconfig.json calls from different directories.
Before you start
- Note the rough repo size:
tokeiorclocgives you a baseline (lines of code, file count). - Confirm which sub-tree the task actually needs — write it down in one sentence.
- Save the current AGENTS.md content; you may need to slim it for in-context loading.
Information to collect
- Total file count and LoC in the repo (
find . -type f -name "*.ts" | wc -l,cloc .). - The exact sub-tree(s) the task touches.
- Current model’s context window size (gpt-5.5 long: 1M; gpt-5.4: 200k; gpt-5.5 standard: 400k).
- Length of AGENTS.md, CLAUDE.md, root README.md in tokens (
wc -w* 1.3 ≈ tokens). - Any task-relevant glossary of internal terms (project codenames, package short names).
Step-by-step fix
Ordered by ROI.
Step 1: Scope the working set in the prompt
Before any plan:
Working scope:
- ONLY edit files under: packages/api-server/, packages/api-types/
- ONLY read for reference: packages/db-client/ (read-only, no edits)
- DO NOT touch: anything else in the monorepo
If a question requires changes outside scope, STOP and ask first.
This cuts the agent’s search space by 80-90% on most monorepos.
Step 2: Pre-feed a structural summary
Before the agent scans, give it a tree summary you generated:
tree -L 3 packages/api-server -I 'node_modules|dist|.next' > /tmp/tree.txt
Then in the prompt:
Repo structure (read this, do not re-list):
[paste tree output]
Key files:
- packages/api-server/src/routes/index.ts — route registry
- packages/api-server/src/services/ — business logic
- packages/api-types/src/index.ts — shared types
The agent now skips structural discovery and goes straight to work.
Step 3: Pin AGENTS.md as a header in the task
Do not rely on auto-summarization preserving it. Embed the relevant slice:
[AGENTS.md excerpt — applies to packages/api-server]
Conventions:
- Routes registered via registerRoute() in routes/index.ts
- Services exported via barrel file
- All handlers return { ok: boolean, data?: T, error?: AppError }
[end excerpt]
Task: ...
This survives summarization because it lives in the user message, not in a tool output.
Step 4: Use directory-level summaries instead of file reads
For exploration phase, prefer summaries over content:
Run: ls packages/api-server/src/services/
Run: head -1 packages/api-server/src/services/*.ts (first line / docstring of each)
DO NOT read full content until you have identified the target service.
Knowing 30 service names + their one-line docs costs ~500 tokens. Reading all 30 costs ~50k.
Step 5: Cap verbose tool output
Wrap noisy commands:
pnpm tsc --noEmit 2>&1 | tee /tmp/tsc.log | head -100
echo "(full output in /tmp/tsc.log)"
Or have the agent grep only what it needs:
grep -E "error TS|src/api-server/" /tmp/tsc.log | head -50
100 lines instead of 4000.
Step 6: Break into checkpointed sub-tasks with fresh contexts
For long jobs, run a sequence of agent invocations:
Invocation 1: Add the new types in packages/api-types. Commit.
Invocation 2: Add the route + handler in packages/api-server. Commit.
Invocation 3: Add the integration test. Commit.
Each invocation gets a fresh context. The committed work between steps is the durable state.
Step 7: Use a long-context model variant when warranted
If even after scoping you still need 300k+ tokens of context, switch:
codex agent run --model gpt-5.5-long task.md
But understand: long-context is a band-aid for cause #1 and #5; it does not fix cause #2 (plan summarization) or cause #4 (verbose output drowning attention).
Verify
- Re-run the same task and check the transcript: scoped reads only, no cross-package wandering.
- Re-prompt mid-task with “what step are you on” and “quote AGENTS.md rule for this” — both should come back precise, not paraphrased.
- Open
top/ nvidia-smi if local — token throughput should look steady, not stalling on giant tool outputs.
Long-term prevention
- Every agent task on a monorepo starts with an explicit
Working scope:block. - Maintain a
repo-map.mdat the root with the tree summary + key file pointers; ask the agent to read it first. - Per-package AGENTS.md — closest one wins, much smaller per-task.
- For exploration, the agent uses
ls + head -1summaries before any full file reads. - Cap shell tool output to 100 lines via wrappers; redirect overflow to
/tmp/*.log. - Split multi-package work into commit-checkpointed sub-tasks; never one mega-prompt for a 3-package change.
Common pitfalls
- Pasting AGENTS.md once and assuming it stays for 200 turns — it does not, summarization will eat it.
- Trusting
--max-tokens 1Mto fix everything — attention quality drops well before the hard cap. - Using a long-context model without scoping — you pay 10x more and still get lost output.
- Letting
tsc --noEmitrun with no head/tail cap — one bad command can blow 40% of your window. - Re-prompting “what files have you read” — the answer is incomplete because the read list itself was summarized.
FAQ
Q: My repo is 200k LoC, do I really need to scope?
Yes. Even 200k LoC reads in full would be ~5M tokens. You will always be a small fraction of the full codebase. Scope early to make that fraction intentional rather than random.
Q: How do I build the structural summary quickly?
tree -L 3 -I 'node_modules|dist|.next|coverage' > repo-map.txt
Two commands and you have a 1-2k token map that beats 50 random file reads.
Q: The agent ignored my “DO NOT touch other packages” rule. What now?
Move that rule into AGENTS.md (auto-loaded), pin it at top of task message, and add a verifier: “Before finishing, run git diff --name-only and confirm all files are under scope”. The verifier turns silent violations into visible errors.
Q: Does turning on Codex’s caching help?
Yes — prompt caching makes scoping cheaper, but it does not prevent context overflow. Caching speeds up reads of the same content, but the content still occupies window.
Related
- Codex Agent stops mid-task without error
- Codex Agent spawns too many redundant tool calls
- Codex ignores project structure
- Codex misses project conventions
- Codex PR too large to merge
- Codex audit report too broad
- Codex Agent Output Conflicts With Prettier
- Codex PR Description Says “Refactored Components” and Nothing Else
Tags: #Codex #agent #Troubleshooting #context-window #monorepo