Codex Agent Goes Out-of-Context on Long Repos

Q: My repo is 200k LoC, do I really need to scope?

Yes. 200k LoC reads in full would be on the order of 5M tokens against a 400k Codex window. You will always have only a small fraction of the codebase in context. Scope early so that fraction is intentional rather than random.

Q: How do I build the structural summary quickly?

```bash tree -L 3 -I 'node_modules|dist|.next|coverage' > repo-map.txt ```

Q: The agent ignored my "DO NOT touch other packages" rule. What now?

Move that rule into the nearest `AGENTS.md` (re-read every turn) instead of a chat message, and add a verifier line: "Before finishing, run `git diff --name-only` and confirm all files are under scope." The verifier turns silent violations into visible errors. Launching with `-C` plus `--sandbox workspace-write` adds a hard guardrail on top.

Q: Doesn't GPT-5.5 have a 1M context window? Why am I overflowing?

The 1M window is the GPT-5.5 *API* limit. Codex deliberately caps its surface at 400k as of June 2026 for throughput and cost. Inside Codex you have 400k, so plan around that, not a million.

Q: Does prompt caching help here?

It makes scoping cheaper but does not prevent overflow. Caching speeds up reuse of identical content (and lowers cost), but that content still occupies the window. Scoping is what reduces how much occupies it.

On 500k+ line repos Codex loses the thread halfway. Fix it by scoping the working set, pre-feeding directory summaries, and putting conventions in AGENTS.md so they survive compaction.

Published: May 23, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You point Codex at a 500k-line monorepo and ask it to “add a new API endpoint, wire it through service plus DB, and add a test.” Two minutes in, it has read 40 random files, the session has compacted twice, and now it is editing the wrong package — it forgot which app’s routing layer it was supposed to touch. The output kind of works but kind of does not, and you cannot tell where it went off the rails. On small repos Codex feels almost effortless; on long ones it gets lost.

Fastest fix: bound the working set before the agent starts. Tell it exactly which sub-tree to edit, paste a tree summary of that sub-tree so it skips discovery, and put the conventions it must follow in AGENTS.md (which Codex re-reads every turn and which survives compaction) rather than in a one-time message that compaction will summarize away. Those three moves fix the large majority of “lost on a big repo” sessions before you ever reach for a bigger model.

One correction up front, because it changes the whole strategy: as of June 2026, Codex CLI runs gpt-5.5 (ChatGPT sign-in default), and OpenAI deliberately caps the Codex surface at a 400k-token context window even though the GPT-5.5 API itself reaches 1M. You are working inside 400k, not a million. So scoping is not optional on a large repo — you will overflow 400k long before you have “the whole repo in context.”

How Codex manages context (so the fixes make sense)

Two mechanics matter here:

Compaction. When the session approaches the window cap, Codex summarizes older turns to free space (you can also trigger it manually with /compact, or wipe the conversation with /clear). Compaction preserves what it judges important and drops the rest — which is exactly how a 12-step plan or a structural anchor quietly disappears.
AGENTS.md is re-read every turn. Unlike a fact you pasted once into a chat message, the AGENTS.md chain is re-injected on each turn and survives compaction intact. That is the single most useful lever you have: anything you cannot afford to lose belongs in AGENTS.md, not in a one-off prompt.

The lookup chain (built once per session) walks from ~/.codex/AGENTS.md down through every AGENTS.md on the path from your git root to your current directory. Files are concatenated root-first, leaf-last, so a closer AGENTS.md overrides the root on conflicts. Only files on your path load — apps/api/AGENTS.md does not load while you work in apps/web/. The combined chain is capped at 32 KiB by default (project_doc_max_bytes), so keep each file lean or the leaf rules can get starved by an oversized root.

Common causes

Ordered by how often they cause loss-of-context on big repos.

1. No scope — agent scans the whole repo

You said “find where authentication is wired.” The agent runs ripgrep across the entire monorepo, gets 4000 matches, reads 50 random files, and the original task scrolls toward the compaction line.

How to spot it: the transcript shows reads under unrelated packages (e.g. marketing-site/ while editing api-server/).

2. Plan list compacted away

Once the session nears the window cap, Codex condenses early turns. The original 12-step plan becomes “the user wants to add an endpoint” and the agent forgets steps 6-12.

How to spot it: re-prompt “list remaining plan items” — the answer has fewer items than you wrote, or items are vaguer than the original.

3. Conventions live in a one-off message instead of AGENTS.md

You pasted the project conventions into your opening message. By turn 30 the session has compacted and “follow conventions” is all that survives. The agent now generates with no specific rules in memory. (Conventions placed in AGENTS.md would have been re-read every turn and survived this.)

How to spot it: output violates a convention you clearly stated. Re-prompt “quote the relevant rule for this” — if the agent paraphrases or invents, the rule was in a message that got compacted, not in AGENTS.md.

4. Verbose tool output floods context

A pnpm tsc --noEmit dump of 4000 type errors, a 30-file directory listing with full file contents, a 10k-line test log — each chews through the window in one turn.

How to spot it: one tool call accounts for > 30% of total tokens. Run wc -l on the suspect command’s stdout after the fact.

5. Agent re-reads the same file multiple times

Without an in-context “what I have already read” cache, the agent re-issues reads for files it already consumed. Each re-read costs window with zero new info, and re-reads spike right after a compaction (the read history was summarized away).

How to spot it: search the transcript for duplicated Read <path> calls. Three or more reads of the same path equals significant waste.

6. Cross-package edits without a dependency map

The task touches 3 packages, but the agent does not know the dependency graph. It re-discovers it via repeated reads of package.json, tsconfig.json, and lockfiles — each discovery eats window.

How to spot it: the transcript contains many Read package.json and Read tsconfig.json calls from different directories.

Diagnosis: which bucket are you in?

Symptom in the transcript	Most likely cause	Jump to
Reads under packages the task never mentioned	No scope (#1)	Step 1
Agent skips/forgets your later plan steps	Plan compacted (#2)	Step 6
Output breaks a rule you clearly stated	Conventions not in AGENTS.md (#3)	Step 3
One command dominates the token count	Verbose tool output (#4)	Step 5
Same file read 3+ times	No read cache (#5)	Step 4
Repeated reads of `package.json` / `tsconfig.json`	No dependency map (#6)	Step 2

Before you start

Note the rough repo size: tokei or cloc gives you a baseline (lines of code, file count). A 100k-line repo tokenizes to roughly 300-500k tokens depending on language — already past the 400k Codex window, which is why “read the whole thing” is never the plan.
Confirm which sub-tree the task actually needs — write it down in one sentence.
Check your AGENTS.md chain length; the combined cap is 32 KiB, so if the root is huge, your package-level rules may be getting truncated.

Information to collect

Total file count and LoC in the repo (find . -type f -name "*.ts" | wc -l, cloc .).
The exact sub-tree(s) the task touches.
The Codex context window for your model: gpt-5.5 on the Codex surface is 400k as of June 2026 (the 1M GPT-5.5 window is API-only). Other selectable models: gpt-5.4 (200k), gpt-5.4-mini, gpt-5.3-codex, and gpt-5.3-codex-spark (Pro only).
Length of AGENTS.md and root README.md in tokens (wc -w times 1.3 is a rough token estimate).
Any task-relevant glossary of internal terms (project codenames, package short names).

Step-by-step fix

Ordered by ROI.

Step 1: Scope the working set in the prompt

Before any plan:

Working scope:
- ONLY edit files under: packages/api-server/, packages/api-types/
- ONLY read for reference: packages/db-client/ (read-only, no edits)
- DO NOT touch: anything else in the monorepo

If a question requires changes outside scope, STOP and ask first.

This cuts the agent’s search space by 80-90% on most monorepos. To enforce it at the tool level, launch Codex pointed at the sub-tree with -C (working directory) and --sandbox workspace-write so writes outside the workspace need approval:

codex --cd packages/api-server --sandbox workspace-write

Step 2: Pre-feed a structural summary

Before the agent scans, give it a tree summary you generated:

tree -L 3 packages/api-server -I 'node_modules|dist|.next' > /tmp/tree.txt

Then in the prompt:

Repo structure (read this, do not re-list):

[paste tree output]

Key files:
- packages/api-server/src/routes/index.ts — route registry
- packages/api-server/src/services/ — business logic
- packages/api-types/src/index.ts — shared types

The agent now skips structural discovery and goes straight to work. This also pre-empts cause #6: hand it the dependency relationships so it does not re-read package.json from five directories to reconstruct them.

Step 3: Put the conventions in AGENTS.md, not in the prompt

This is the highest-leverage change and the one most people get wrong. Do not paste conventions into the opening message and hope they stick — compaction will summarize them away. Put them in the closest AGENTS.md to the sub-tree (Codex re-reads it every turn and it survives compaction):

# packages/api-server/AGENTS.md

Conventions:
- Routes are registered via registerRoute() in routes/index.ts
- Services are exported via the barrel file
- All handlers return { ok: boolean, data?: T, error?: AppError }
- Before finishing, run `git diff --name-only` and confirm every file is under packages/api-server or packages/api-types

Because the chain composes root-first then leaf-last, this package-level file overrides anything looser in the repo-root AGENTS.md. Keep it small: the whole chain is capped at 32 KiB, so a bloated root file can starve these rules.

Step 4: Use directory-level summaries instead of file reads

For the exploration phase, prefer summaries over full content:

Run: ls packages/api-server/src/services/
Run: head -1 packages/api-server/src/services/*.ts   (first line / docstring of each)
DO NOT read full content until you have identified the target service.

Knowing 30 service names plus their one-line docs costs roughly 500 tokens. Reading all 30 in full costs roughly 50k. This also kills the re-read waste in cause #5 — the agent picks the one file it needs instead of grazing.

Step 5: Cap verbose tool output

Wrap noisy commands so a single dump cannot blow your window:

pnpm tsc --noEmit 2>&1 | tee /tmp/tsc.log | head -100
echo "(full output in /tmp/tsc.log)"

Or have the agent grep only what it needs:

grep -E "error TS|src/api-server/" /tmp/tsc.log | head -50

100 lines instead of 4000.

Step 6: Break into checkpointed sub-tasks with fresh contexts

For long jobs, run a sequence of non-interactive invocations, each from a fresh context. Commit between them so the committed work — not the conversation — is the durable state:

codex exec - < step1-add-types.md      # Add new types in packages/api-types, then commit
codex exec - < step2-add-route.md      # Add route + handler in packages/api-server, then commit
codex exec - < step3-add-test.md       # Add the integration test, then commit

codex exec (alias codex e) is the non-interactive command; codex exec - < file reads the prompt from a file. In an interactive session you can get the same effect by running /compact at a clean checkpoint, or /clear to start a genuinely fresh conversation once each sub-task is committed.

Step 7: Tune the compaction threshold before reaching for a bigger model

You cannot pick a “long-context” Codex model — there is no gpt-5.5-long, and the Codex surface caps gpt-5.5 at 400k regardless. So the lever is not a bigger window; it is leaving compaction more headroom. For repos with large essential context, lower the auto-compaction trigger to about 80-85% of capacity so the post-compaction re-read cycle (system prompt plus the AGENTS.md chain) has room to land:

# ~/.codex/config.toml
model = "gpt-5.5"

Understand the tradeoff: earlier compaction protects against hard overflow (causes #1 and #5) but does nothing for a plan that was already condensed (cause #2) or a giant tool dump (cause #4). Scoping and AGENTS.md remain the real fixes.

How to confirm it’s fixed

Re-run the same task and read the transcript: scoped reads only, no cross-package wandering.
Re-prompt mid-task with “what step are you on?” and “quote the rule for this from AGENTS.md” — both should come back precise, not paraphrased. If the rule comes back verbatim, it is genuinely in context.
Run git diff --name-only at the end and confirm every changed file is inside your declared scope. A path outside scope means Step 1 did not hold and you should add the diff-check rule to AGENTS.md.

Long-term prevention

Every agent task on a monorepo starts with an explicit Working scope: block, and launch with -C pointed at the sub-tree.
Maintain a repo-map.md at the root with the tree summary plus key-file pointers; ask the agent to read it first.
Per-package AGENTS.md — closest one wins on conflicts, and it is much smaller per task. Keep the root file lean so you stay under the 32 KiB chain cap.
For exploration, the agent uses ls plus head -1 summaries before any full file reads.
Cap shell tool output to 100 lines via wrappers; redirect overflow to /tmp/*.log.
Split multi-package work into commit-checkpointed sub-tasks; never one mega-prompt for a 3-package change.

Common pitfalls

Pasting conventions into one message and assuming they stay for 200 turns — they will not; put them in AGENTS.md, which is re-read every turn and survives compaction.
Expecting a “long-context” Codex model to save you — there is none; gpt-5.5 on Codex is 400k and attention quality drops well before that cap.
Stuffing every rule into the repo-root AGENTS.md until it overruns the 32 KiB chain cap and silently truncates your leaf-package rules.
Letting tsc --noEmit run with no head/tail cap — one bad command can blow 40% of your window.
Re-prompting “what files have you read?” — the answer is incomplete because the read history itself was compacted.

FAQ

Q: My repo is 200k LoC, do I really need to scope?

Yes. 200k LoC reads in full would be on the order of 5M tokens against a 400k Codex window. You will always have only a small fraction of the codebase in context. Scope early so that fraction is intentional rather than random.

Q: How do I build the structural summary quickly?

tree -L 3 -I 'node_modules|dist|.next|coverage' > repo-map.txt

Two commands and you have a 1-2k token map that beats 50 random file reads.

Q: The agent ignored my “DO NOT touch other packages” rule. What now?

Move that rule into the nearest AGENTS.md (re-read every turn) instead of a chat message, and add a verifier line: “Before finishing, run git diff --name-only and confirm all files are under scope.” The verifier turns silent violations into visible errors. Launching with -C plus --sandbox workspace-write adds a hard guardrail on top.

Q: Doesn’t GPT-5.5 have a 1M context window? Why am I overflowing?

The 1M window is the GPT-5.5 API limit. Codex deliberately caps its surface at 400k as of June 2026 for throughput and cost. Inside Codex you have 400k, so plan around that, not a million.

Q: Does prompt caching help here?