Coding Prompt Structure That Ships Clean PRs

Q: AGENTS.md or CLAUDE.md — which do I write?

Write `AGENTS.md` if your team uses Codex, Cursor, Copilot, or Gemini CLI; it is the cross-tool standard. Claude Code reads `CLAUDE.md`, so keep one or have it import the other. The content is the same: build commands, test commands, conventions, and what to avoid.

Goal + constraints + acceptance + hand-off. Skip a block and the AI guesses. The four-part template, with 2026 plan-mode and AGENTS.md specifics.

Published: May 17, 2026 Updated: Jun 09, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

TL;DR

A coding prompt that produces merge-ready output has four blocks: Goal (one specific sentence), Constraints (files to touch, files to leave alone, patterns to mirror, and what NOT to do), Acceptance (one test or observable behavior that proves it works), and Hand-off (how much autonomy the agent has). Skip any block and the model fills the gap with assumptions. Pair the template with a repo AGENTS.md/CLAUDE.md and plan-first mode, and the same task that an agent gets right roughly a third of the time without context starts landing reliably.

Why structure beats cleverness

Most “the AI wrote bad code” stories are “I wrote a bad prompt” stories. The model is not the bottleneck. As of June 2026, Claude Opus 4.7 scores 87.6% on SWE-bench Verified and GPT-5.5 hits 82.7% on Terminal-Bench 2.0 — these models can write the code. What they cannot do is read your mind about which files are off-limits, which error format you use, or what “done” means on your team.

A 2025 study on agentic coding tasks found agents working without project context files completed tasks correctly about 30% of the time; the same tasks with well-crafted context files in place reached roughly 90%. The gap is not model capability. It is the structure of what you hand the model.

This guide is for anyone prompting an agent to write code — Cursor, Claude Code, Codex, or ChatGPT on the side. Reach for the full template before any non-trivial change: a new feature, a refactor, or a bug fix that touches multiple files. Skip it for one-line tweaks where structure is overkill.

The four blocks

1. Goal — one specific sentence

One sentence, specific, with measurable scope. This is the “what,” and there is no room for ambiguity.

Vague: “Improve the login flow.”
Specific: “Add rate limiting to /api/login to block more than 5 failed attempts in 15 minutes per IP, returning HTTP 429.”

2. Constraints — including what NOT to do

Language, framework, files to touch, files to leave alone, existing patterns to mirror — and explicit negative constraints. Telling the model what to avoid eliminates entire categories of low-quality output, which is why 2026 prompt guides emphasize negative constraints as heavily as positive ones.

“Use the existing helper in src/middleware/rate-limit.ts. Do not add new dependencies. Do not modify other API routes. Match the error format in auth.ts.”

3. Acceptance — one test or behavior

One test or one observable behavior that proves it works. If you cannot state the acceptance check, you do not yet know what “done” looks like, and neither will the model.

“A unit test in src/api/__tests__/login.test.ts asserts the 6th failed attempt within 15 minutes returns 429 with the standard error JSON. Existing login tests still pass.”

4. Hand-off — how much autonomy

How much rope the agent gets before you review.

“Plan first. List the files you intend to change and the test name. Do not write code until I approve the plan.”

A worked example you can paste

Goal:
Add rate limiting to /api/login to block more than 5 failed
attempts in 15 minutes per IP, returning HTTP 429.

Constraints:
- Use src/middleware/rate-limit.ts; do not add new dependencies.
- Do not modify other API routes.
- Match the existing error response format (see auth.ts).
- Failed attempts only; successful logins reset the counter.

Acceptance:
- New unit test in src/api/__tests__/login.test.ts asserts the
  6th failed attempt within 15 minutes returns 429 with the
  standard error JSON.
- Existing login tests still pass.
- No console warnings during npm test.

Hand-off:
Plan first. List target files and test name. Wait for approval
before writing code.

This skeleton works for refactors, bug fixes, and dependency upgrades — swap the content, keep the four blocks.

Use plan-first mode, not just a plan-first sentence

The hand-off block is most powerful when the tool enforces it. As of June 2026:

Claude Code plan mode: press Shift+Tab twice to enter it (first press toggles auto-accept, second enters plan mode), or use the /plan command. In plan mode the agent reads files and writes a numbered plan — naming target files, side effects like migrations or new env vars, and risks — without editing anything until you approve.
Cursor: Composer proposes a plan of changes before executing; review and iterate on the plan before granting permission to proceed.

Plan-first costs about 30 seconds and prevents the most common waste: the agent writing 200 lines you have to throw away. The teams shipping the cleanest PRs in 2026 are not the ones with the cleverest prompts; they treat planning as a first-class step.

Move standing constraints into AGENTS.md

Anything you would repeat in every prompt — naming conventions, error handling, test commands, the async style your codebase uses — belongs in a repo context file, not your prompt.

AGENTS.md became the open standard for this in 2026. OpenAI proposed it in August 2025; it was donated to the Linux Foundation’s Agentic AI Foundation in December 2025, and as of June 2026 it is used by 60,000+ repositories and read natively by Codex, Cursor, GitHub Copilot, Gemini CLI, Aider, Zed, Warp, and others. Claude Code reads CLAUDE.md (point it at AGENTS.md with an import or keep both in sync). When an agent makes a repeated mistake, fix the file rather than re-explaining each session — that feedback loop is the real productivity gain.

Prompt length: aim for 150-300 words

For typical work, the effective range is roughly 150-300 words. Longer when constraints stack up; suspiciously short when you have omitted a block. Resist stuffing the entire project into one prompt — 2026 guides call this the “curse of instructions,” where the model loses focus across too many demands. Give it one focused task at a time and lean on the context file for the standing rules.

Constraint vs intuition: be rigid about “what,” loose about “how”

Split your prompt into two registers. The specification — the “what” — has no room for ambiguity; state the exact expected outcome. Design intuitions — the “how” — should use non-binding language: present them as suggestions, not orders, so the model can apply judgment where you do not actually have a hard requirement. Mixing the two (turning a soft preference into a hard rule, or leaving a hard requirement vague) is where structured prompts still go wrong.

Reviewing the output

The structure makes output reviewable, not merge-ready. Before you ship:

Diff every affected file. Anything outside the stated constraints is a bug regardless of how clever it looks.
Run the acceptance test. “Looks right” is not a passing test.
Revert unrequested extras. “Helpful” additions you did not ask for are a common failure mode.
Read the code. Even with structure, AI output is a first draft.

Common mistakes

Mistake	What happens	Fix
One-liner prompts	”Add rate limiting” produces a different rate limiter every run	Use all four blocks
No acceptance criteria	”Done” drifts to whatever the model felt good about	State one test or behavior
Missing constraints	The agent picks conventions, often wrong for your codebase	List touch/no-touch files and the pattern to mirror
No hand-off rule	The agent writes hundreds of lines you discard	Enforce plan-first mode
Reusing a stale prompt	Codebases drift; old constraints point at moved files	Update constraints per task
Standing rules in every prompt	Bloated prompts, repeated mistakes	Move them to `AGENTS.md`/`CLAUDE.md`

FAQ

Does this work for autocomplete-style assistants? Partially. Tab-style autocomplete favors brevity and works off cursor context. Use the full four-block structure for agent modes — Cursor Composer, Claude Code, Codex, and Aider — where the model plans and edits across files.

What if I do not know the constraints yet? Ask the agent to propose them. “Propose 3 constraints before writing any code” is a valid first message; approve or edit what it returns, then proceed.

AGENTS.md or CLAUDE.md — which do I write? Write AGENTS.md if your team uses Codex, Cursor, Copilot, or Gemini CLI; it is the cross-tool standard. Claude Code reads CLAUDE.md, so keep one or have it import the other. The content is the same: build commands, test commands, conventions, and what to avoid.

Can I skip plan-first for trusted models? On genuinely simple, single-file tasks, yes. On anything multi-file or production-bound, keep plan-first regardless of how strong the model is — the cost is 30 seconds and the savings are measured in discarded PRs.

How long should the prompt be? About 150-300 words for typical work. If it is much shorter, you probably dropped a block. If it is much longer, move the standing rules into your context file.

Tags: #AI coding #Tutorial

TL;DR

Why structure beats cleverness

The four blocks

1. Goal — one specific sentence

2. Constraints — including what NOT to do

3. Acceptance — one test or behavior

4. Hand-off — how much autonomy

A worked example you can paste

Use plan-first mode, not just a plan-first sentence

Move standing constraints into AGENTS.md

Prompt length: aim for 150-300 words

Constraint vs intuition: be rigid about “what,” loose about “how”

Reviewing the output

Common mistakes

FAQ

Related

Related Articles

AI Changelog Generation: From Commits to a Release Note Humans Read

AI-Assisted Database Migrations — Reversible, Backfilled, Tested

AI for Incident Postmortems Without Sanitizing the Lessons

AI Merge Conflict Resolution: When to Trust the Auto-Merge

AI On-Call Debugging: From Page to Fix Without Panic

AI PR Descriptions: From Diff to Reviewable