If windows are 1M tokens now, why not just dump everything in?

Because the window is headroom, not free. Chroma's 18-model study shows accuracy declines as input grows, and coherent extra code makes for convincing distractors. The largest window does not change the rule: include only what changes the answer.

How much context is too much?

When the model starts ignoring parts of it, latency gets painful, or your tool warns it is about to compact. The honest signal is behavioral — if the output stops respecting a constraint you stated, your context is too noisy, not too short.

Should I include tests?

Yes if the goal is to make a specific test pass; attach that test. Otherwise skip — tests bloat context and the model often "fixes" the wrong thing.

Inline code or file attachment?

File attachment (or `@Files` in Cursor) for anything over 30 lines. Tools handle attached files better than long pasted blocks because tokenization is cleaner and they can deduplicate.

Do I include README content?

Only the relevant section. A 500-line README pasted whole wastes context; one paragraph about the convention you want followed is the right amount.

What about chat history?

Long history quietly costs context and rots quality. Start a new chat when switching tasks, or run `/compact` in Claude Code before a fresh sub-task instead of letting the previous one bleed in.

Does prompt order really matter that much?

Yes, measurably. The model treats early context as background facts and late context as the active task. Reversing the order changes output quality.

AI Tool Tutorials

AI Coding Context Management: What to Feed, What to Cut

AI coding quality is mostly a context problem. Here is how to feed Claude, Cursor, and Codex the right context — and cut the rest — as of June 2026.

Published: May 17, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

The gap between “this AI is brilliant” and “this AI is useless” is almost never the model. It is the context you fed it. Too little, and the agent guesses, inventing functions that do not exist. Too much, and the model’s attention dilutes, producing bland code that ignores half your constraints. The catch in 2026 is that the obvious fix — pour in more context, since the windows are huge now — backfires. This guide is for developers using Claude Code, Cursor, ChatGPT, or Codex on any non-trivial task. The goal is shipping a usable first draft, not arguing with the model for forty minutes.

TL;DR

Context windows are no longer the constraint. As of June 2026, Claude Opus 4.7, Sonnet 4.6, and Gemini 3.1 Pro all run a 1M-token standard window; the constraint is the model’s attention inside that window.
“Context rot” is real and measured. Chroma tested 18 frontier models and found every one degrades as input grows; a 200K-claimed window typically becomes unreliable well before that, and tasks needing semantic matching drop from 95%+ accuracy to 60–70%.
Feed less, ordered well. Language and framework versions first, file and conventions next, hard constraints, then the goal last.
Show conventions by example (one anchor file), not by prose.
Use the tools’ native context controls: Cursor @-mentions and .cursor/rules/*.mdc, Claude Code CLAUDE.md plus /compact, and AGENTS.md for repo-wide instructions.

Why bigger windows did not fix this

A reasonable assumption in 2024 was that context management would become a non-issue once windows got large enough. The opposite happened. Chroma’s 2025 “context rot” study ran 18 frontier models (including Claude 4, GPT, and Gemini families) through tests harder than the classic needle-in-a-haystack and found a consistent pattern: every model gets worse as the input grows. Models that score 95%+ when the fact is verbatim drop to 60–70% once the task requires matching on meaning or resisting plausible distractors. Counterintuitively, models did better on shuffled haystacks than on logically coherent documents — coherent prose produces more convincing distractors that pull attention away from the relevant lines.

The practical takeaway: a 1M-token window is headroom, not an invitation. Your job is still to select the few thousand tokens that change the answer and leave the rest out. (The full methodology is in Chroma’s context rot research.)

Who this is for

Developers using Claude Code, Cursor, ChatGPT, or Codex for any non-trivial coding task — anything beyond renaming a variable or generating a one-line regex. Especially useful in frameworks the model knows less well (small DSLs, internal libraries, framework versions newer than the model’s training cutoff) or in codebases with strong unwritten conventions. Skip it for trivial tasks where the answer is obvious from the function signature; just paste and ask, because over-engineering the prompt costs more than the task.

Before you start

Know the difference between the nominal and the effective window. As of June 2026, Opus 4.7 and Sonnet 4.6 advertise 1M tokens (standard pricing, no long-context premium since the March 2026 GA), and Gemini 3.1 Pro matches it. But quality starts sliding long before the limit. Note that ChatGPT does not give you the full 1M in-app on Plus — that is reserved for the $200 Pro tier; Plus shows roughly 320 pages of working context.
Open your codebase in a tool that can attach files (Cursor, Claude Code) rather than copy-paste. File attachment keeps structure and lets the tool deduplicate.
Identify the “anchor” example: one existing file that shows the convention you want followed.

Step by step

Enumerate context categories. Before prompting, list: language and version, framework and version, the file you are editing, conventions to follow, hard constraints (performance, security, browser support), success criteria, and any APIs or libraries the change touches.
Decide per category. For each, decide: include inline in the prompt, attach as a file, or skip entirely. The default should be “skip unless it changes the answer.”
Order the prompt. Language and framework first, then file context, then conventions, then constraints, then the goal last. Models anchor strongly to what comes first; the goal at the end stays in working memory.
Cut filler. Everything that does not change the answer comes out. A 12,000-token prompt with 2,000 tokens of relevant info performs worse than a 2,500-token prompt with the same 2,000 plus 500 of buffer — the extra 9,500 tokens are pure distractor surface for the attention mechanism.
Fill knowledge gaps with docs. For framework features the model does not know (anything released after the model’s training cutoff, or any internal library), paste the relevant doc section directly, or in Cursor pull it with @Docs. Do not hope the model “remembers.”
Show conventions by example. Attach one existing file that demonstrates the naming, error handling, and structure conventions. One example beats five paragraphs of “we always do X except when Y.”
Prune after the first response. Note which context the model used and which it ignored. Cut the ignored parts and save the rest as a template.

Effective vs. nominal context, June 2026

The headline windows are large and roughly equal now; the differences that matter for context management are where the full window is actually available to you and how each tool reclaims space when the session fills up.

Model / tool	Nominal window	What you get in practice
Claude Opus 4.7 / Sonnet 4.6 (API)	1M tokens, standard price	Full 1M; degradation begins well before the limit
Claude Code	1M (model)	Auto-compacts near ~95% usage (sometimes earlier); keeps a ~33K-token buffer for tool output
ChatGPT Plus ($20)	—	~320 pages of working context in-app; full 1M only on $200 Pro
Gemini 3.1 Pro / Google AI Pro ($19.99)	1M tokens	Full 1M context
Cursor (Pro $20)	model-dependent	You control intake via `@`-mentions and the codebase index

Figures as of June 2026; vendor tiers and limits change often.

Anatomy of a good prompt

[CONTEXT - put first]
TypeScript 5.4, React 19, Next.js 15 App Router.
Existing component for style reference: attached `Button.tsx`.
Convention: all components export named, not default.
Error handling: throw typed errors from `lib/errors.ts`.

[CONSTRAINTS]
- No new dependencies.
- Must be a server component (no `use client`).
- Tailwind only, no CSS modules.

[GOAL - put last]
Create `Card.tsx` matching `Button.tsx` style.
Props: title (string), body (ReactNode), variant ("default" | "muted").

Using each tool’s native context controls

The 2026 tools each give you levers that beat copy-paste. Learn the three you use most.

Cursor. Every @-mention pins something specific: @Files / @Folders for exact code, @Codebase for semantic search across the indexed project, @Docs for official library docs, @Web for live search. Let the workspace finish its first index — @Codebase and Agent-mode awareness both depend on it. For standing conventions, use .cursor/rules/*.mdc files (the MDC format that replaced the single .cursorrules file); they support glob scoping so a rule only loads for the files it applies to.
Claude Code. Put durable project rules in CLAUDE.md at the repo root — it loads at session start and survives compaction, unlike a one-off prompt. When the session fills up, Claude Code auto-compacts near ~95% usage (and sometimes as early as 64–75% to avoid a failed pass); you can also run /compact manually before a fresh sub-task so it summarizes on your terms instead of mid-thought. Anthropic’s context-window docs cover the buffer math.
AGENTS.md. Now the cross-tool standard for repo-wide agent instructions, read by Cursor, Codex, and others. Nested AGENTS.md files merge with their parents, with the more specific file winning — handy in a monorepo where each package has its own conventions.

Quality check

Did the model use the version numbers you specified? Look for ?? (newer JS) vs. || (ES5) tells.
Did the output follow your attached example file’s conventions, or its own training preferences?
Did the model reference any function or import that does not exist? Hallucinated dependencies are the loudest sign your context was incomplete.
Did you have to add details in a follow-up message? Move those into the original prompt — or into CLAUDE.md / a .cursor/rules file if it is a standing convention — next time.

Two concrete recipes

New React component. Framework versions, one existing component as a style anchor, and the goal. Skip the entire CSS module unless the goal touches styles. In Cursor, that is @Button.tsx plus the goal; the index handles the rest. One round to a usable draft.

Database migration. The migration tool and version, the most recent relevant migration, and the schema for only the touched tables. Skip everything else — old migrations are pure distractor mass. One round to a usable draft.

For repeating tasks, store the recipe as a .cursor/rules file or a section of CLAUDE.md so the whole team uses the same context shape. When a piece is too big to attach whole, summarize it (“we use Tailwind with these custom utilities: …”) rather than pasting the file.

Common mistakes

Pasting the whole file when only one function matters. Wastes context and dilutes attention to the relevant lines.
Describing conventions in prose (“we use camelCase except for constants which are SCREAMING_SNAKE_CASE except inside React components where…”) when one example file shows the convention clearly in 30 lines.
Putting the goal at the top and code at the bottom. Reverse it — the goal should be the last thing the model reads before generating.
Forgetting to mention version numbers. You get ES5 syntax in an ES2024 codebase, or React class components in a hooks codebase.
Attaching documentation for features you do not actually use. The model will dutifully try to use them and you will have to delete the code.
Skipping success criteria. “Make this faster” with no benchmark is a recipe for plausible-looking changes that do not measurably help.

Advanced tips

Treat the comfortable working window as far smaller than the nominal one. Even with a 1M-token window, code-heavy sessions start losing accuracy long before that, and the Chroma data shows the slide is gradual, not a cliff. Keep the live context lean and lean on /compact or a fresh chat rather than letting it grow.
For unfamiliar libraries, paste the README section plus the exact function signature you need (or @Docs it in Cursor). Hallucinated dependencies drop sharply.
Use a smaller, faster model for context triage (“which of these 12 files is relevant to this task?”) and a stronger model — Opus 4.7 for the hardest changes — for the actual code generation. Cheaper and often better. Opus 4.7 leads SWE-bench Verified at 87.6% as of June 2026, but routing easy triage to it is wasted spend.
In a monorepo, scope conventions with nested AGENTS.md or glob-scoped .cursor/rules so the agent only ever loads the package it is working in.

Output checklist

Language and framework version stated.
Existing code attached or summarized, not described in prose.
Conventions shown by example, not by prose.
Constraints listed explicitly with their reasons.
Goal at the end, after all context.
No filler context that does not change the answer.

FAQ

If windows are 1M tokens now, why not just dump everything in?: Because the window is headroom, not free. Chroma’s 18-model study shows accuracy declines as input grows, and coherent extra code makes for convincing distractors. The largest window does not change the rule: include only what changes the answer.
How much context is too much?: When the model starts ignoring parts of it, latency gets painful, or your tool warns it is about to compact. The honest signal is behavioral — if the output stops respecting a constraint you stated, your context is too noisy, not too short.
Should I include tests?: Yes if the goal is to make a specific test pass; attach that test. Otherwise skip — tests bloat context and the model often “fixes” the wrong thing.
Inline code or file attachment?: File attachment (or @Files in Cursor) for anything over 30 lines. Tools handle attached files better than long pasted blocks because tokenization is cleaner and they can deduplicate.
Do I include README content?: Only the relevant section. A 500-line README pasted whole wastes context; one paragraph about the convention you want followed is the right amount.
What about chat history?: Long history quietly costs context and rots quality. Start a new chat when switching tasks, or run /compact in Claude Code before a fresh sub-task instead of letting the previous one bleed in.
Does prompt order really matter that much?: Yes, measurably. The model treats early context as background facts and late context as the active task. Reversing the order changes output quality.

Tags: #AI coding #Tutorial #Workflow

TL;DR

Why bigger windows did not fix this

Who this is for

Before you start

Step by step

Effective vs. nominal context, June 2026

Anatomy of a good prompt

Using each tool’s native context controls

Quality check

Two concrete recipes

Common mistakes

Advanced tips

Output checklist

FAQ

Related

Related Articles

AI Changelog Generation: From Commits to a Release Note Humans Read

AI-Assisted Database Migrations — Reversible, Backfilled, Tested

AI for Incident Postmortems Without Sanitizing the Lessons

AI Merge Conflict Resolution: When to Trust the Auto-Merge

AI On-Call Debugging: From Page to Fix Without Panic

AI PR Descriptions: From Diff to Reviewable