Which model should I run this with?

For implementation, Claude Opus 4.7 (87.6% on SWE-bench Verified, top of the field as of June 2026) or the cheaper Sonnet 4.6 (3/15 USD per million in/out tokens). For terminal-heavy agent runs, GPT-5.5 leads Terminal-Bench 2.0 at 82.7%. Smaller models miss the intent of the "what is NOT in the spec" prompt.

Is this just GitHub Spec Kit with extra steps?

Spec Kit is one implementation of this workflow. The manual loop here is tool-agnostic; Spec Kit, Kiro, and Claude Code plan mode all encode the same spec → plan → tasks → implement spine.

How long does writing a spec take?

30-60 minutes for a one-day feature. It saves multiples of that downstream — in clarity, not just rework.

Can AI write the spec?

It can draft. You own the decisions. A spec without an owner produces code without direction. At this stage treat the AI as a stenographer, not an author.

What about specs for refactors?

Slightly different shape: before / after, why, and blast radius. See the [AI refactor workflow](/en/articles/ai-refactor-workflow/).

Isn't this just waterfall?

No. A one-page spec is a "definition of done" for one feature. Real waterfall is 30 pages with sign-off gates. SDD keeps the spec living and updates it the moment reality diverges.

AI Tool Tutorials

AI Spec-to-Code Workflow: Ship Features, Not Half-Built Demos

Turn a one-page spec into shipped code with Spec Kit, Kiro, or Claude Code plan mode — and skip the half-built-feature trap. Updated June 2026.

Published: May 17, 2026 Updated: Jun 05, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Most “AI built my feature” stories end at 80% done with the last 20% impossible to finish. The reason is almost always the same: the spec was being written implicitly, inside the code, while the agent kept guessing — and guessing wrong. The fix lives upstream. Write a one-page spec, force the AI to surface its ambiguities, break the work into tickets that each carry a real acceptance test, then ship one ticket at a time. In 2026 this pattern has a name — spec-driven development (SDD) — and real tooling behind it: GitHub’s Spec Kit (109k+ GitHub stars as of June 2026), AWS Kiro, and the plan mode built into Claude Code and Cursor.

TL;DR

The “vibe-coding” failure mode is an implicit spec. Make the spec explicit and the last 20% stops fighting you.
Five-step loop: spec → clarify → plan → tasks → implement, one ticket at a time, each with a runnable acceptance test.
Tooling: use GitHub Spec Kit for a structured CLI flow across 30+ agents, AWS Kiro if you want a spec-first IDE, or Claude Code plan mode (Shift+Tab twice) for a lightweight version with no install.
Pick a strong reasoning model: Claude Opus 4.7 (87.6% SWE-bench Verified) or Sonnet 4.6 for the implementation, GPT-5.5 for terminal-heavy agent runs.
Highest-leverage 10 minutes of the whole task: editing the spec after the AI lists what it does not know.

What the spec-to-code workflow solves

The half-built-feature trap: an agent appears to build your feature, you celebrate the demo, then over the next week you discover edge cases the spec never covered, scope creep the AI silently introduced, and tests that pass but never exercise the new behavior. GitHub’s own framing for Spec Kit calls this “vibe-coding” — fine for throwaway prototypes, unreliable the moment you touch a real codebase. This workflow front-loads the spec rigor that the implicit-spec approach pays for later, in a panic.

Who this is for

Indie devs, prototypers, and developers building features under deadline pressure. It is especially relevant if you have shipped two or three AI-built features and noticed they all need a “second pass” that takes longer than the first. It is less relevant for trivial features — one function, one acceptance test — where the spec overhead is not worth it.

When to reach for it (and when not to)

Reach for it when the feature is big enough to need a one-page spec: anything touching 3+ files, multiple endpoints, or a UI flow with branches. Reach for it when deadline pressure tempts you to skip planning (“just have the agent write it”), or when you have been burned before by AI features that demoed well and broke on edge cases.

Do not reach for it on pure research or exploration tasks where the spec does not exist yet. Write a prototype first to learn what is possible, then derive the spec from what you discovered. Forcing spec-first on exploratory work manufactures fake clarity and steers you away from learning. Spec Kit’s maintainers make the same distinction: SDD is for 0-to-1 builds and iterative enhancement of known systems, not for the “creative exploration” phase.

Pick your tool

You can run this workflow by hand in any chat window, but three tools encode it directly. Choose by how much structure you want imposed.

Tool	What it is	The flow	Best when
GitHub Spec Kit	Open-source CLI (`specify`), 109k+ stars (June 2026)	`/speckit.constitution → specify → clarify → plan → tasks → analyze → implement`	You want a repeatable, reviewable spec flow that works across 30+ agents
AWS Kiro	Spec-first agentic IDE (replaced Amazon Q Developer for new signups May 15, 2026)	Generates `requirements.md`, `design.md`, `tasks.md`, then works the task list	You want the spec to be the unit of work, surfaced as files in the IDE
Claude Code / Cursor plan mode	Built-in planning gate, no install	Plan mode writes a numbered plan; you approve before any edit	You want the lightest-weight version and already live in the agent

A common 2026 split: Kiro for structured feature planning on complex projects, Cursor for rapid iteration, Claude Code for deep architectural reasoning. None of them removes your job — owning the decisions in the spec.

GitHub Spec Kit in two commands

Spec Kit installs through uv and drops its slash commands into whichever agent you use (Claude Code, Copilot, Gemini CLI, Cursor, Codex, and ~30 more):

uv tool install specify-cli --from git+https://github.com/github/spec-kit.git
specify init my-feature

For a quick experiment, run the lean path inside your agent: /speckit.specify → /speckit.plan → /speckit.tasks → /speckit.implement. For anything mission-critical, add the quality gates: /speckit.constitution (project-wide non-negotiables) up front, /speckit.clarify after the spec, and /speckit.analyze before you implement to catch inconsistencies between spec, plan, and tasks.

Claude Code plan mode (no install)

If you do not want a toolkit, Claude Code’s plan mode is the 80/20 version. Press Shift+Tab twice from any prompt. Claude reads the relevant files, writes a numbered plan back to the terminal, and refuses to edit files or run state-changing commands until you approve. That approval gate is exactly the “make the spec explicit before code” discipline, enforced by the tool instead of by willpower.

The workflow, step by step

Write the spec FIRST. One page: user story, acceptance criteria, edge cases, and an explicit out-of-scope section. Out-of-scope is the most-skipped section and the most valuable one.
Make the AI find the gaps. Paste the spec and ask: “List ambiguities and missing details in this spec. Do not code yet. Just questions.” You want output like: “What happens if the user uploads a 50 MB image? Is anonymous upload allowed? What is the error UX on rate-limit?” (In Spec Kit this is /speckit.clarify.)
Fix the spec from those gaps. This is the highest-leverage 10 minutes of the whole task. Every ambiguity resolved now is a half-day of rework you do not pay later.
Break the spec into tickets. Ask: “Break this spec into 5-8 implementable tickets. For each: title, files likely touched, one-sentence acceptance test.” (Spec Kit: /speckit.tasks; Kiro: tasks.md.)
Implement one ticket at a time. Hand each ticket to the agent with the full spec attached, not just the ticket. Run the acceptance test after each. Pass → commit. Fail → fix before moving on.
Run integration tests after all tickets. The full spec’s acceptance criteria become your integration tests. Do not ship until they pass.
Reject scope additions. When the AI suggests “you should also handle X,” check the spec. If X is not there, defer it to a follow-up issue. “Just adding this real quick” is how a feature takes 3x its estimate.

One-page spec template

# Feature: [name]
## User story
As a [role], I want to [action], so that [outcome].

## Acceptance criteria
- [ ] Criterion 1 (concrete, testable)
- [ ] Criterion 2
- [ ] Criterion 3

## Edge cases
- What happens when X
- What happens when Y

## Out of scope (explicitly NOT building)
- Feature A (deferred)
- Feature B (different ticket)

## Data model changes
[schema diff or "none"]

## API surface
[endpoint changes or "none"]

This is deliberately close to Kiro’s three-file split (requirements / design / tasks) so you can graduate to a tool without relearning the shape.

Where to keep the spec so the agent sees it

The spec only works if the agent reads it for every ticket. Two reliable homes:

The draft PR description. Open a feature branch and a draft PR before any AI involvement; the PR body is the spec’s canonical home and your reviewer reads it for free.
A repo rules file. In 2026 the agents converged on a markdown file at repo root: CLAUDE.md (Claude Code), AGENTS.md (the open standard read by Codex, Cursor, Copilot, Gemini CLI, and Windsurf), or .cursor/rules/*.mdc for glob-scoped Cursor rules. Drop the spec path there so the agent loads it automatically. Without the full spec in context, the agent reinvents context on every ticket.

First-run exercise

Pick a feature you shipped recently that had scope creep or late-discovered edge cases. Retroactively write the one-page spec it should have had, then compare it to what you actually built. The gap is your spec-discipline learning material. Run the workflow on your next real feature and measure: did writing the spec save net time, or feel like overhead? Most devs find break-even at feature two.

Quality check

Does the spec fit on one page? If not, split the feature or your acceptance criteria are too verbose.
Did the AI surface real ambiguities, or generic “have you considered error handling?” filler? Generic means your spec was too thin or your prompt did not push hard.
Do tickets map cleanly to spec sections? A ticket that spans two sections is two tickets.
Can each ticket’s acceptance test be checked in 30 seconds? If “manual verification” is the test, the spec is not tight enough.
Did the agent suggest scope additions you rejected? More than 3 rejections means the spec is doing its job; 0 means the agent is not being ambitious enough.

FAQ

Which model should I run this with? For implementation, Claude Opus 4.7 (87.6% on SWE-bench Verified, top of the field as of June 2026) or the cheaper Sonnet 4.6 (3/15 USD per million in/out tokens). For terminal-heavy agent runs, GPT-5.5 leads Terminal-Bench 2.0 at 82.7%. Smaller models miss the intent of the “what is NOT in the spec” prompt.
Is this just GitHub Spec Kit with extra steps? Spec Kit is one implementation of this workflow. The manual loop here is tool-agnostic; Spec Kit, Kiro, and Claude Code plan mode all encode the same spec → plan → tasks → implement spine.
How long does writing a spec take? 30-60 minutes for a one-day feature. It saves multiples of that downstream — in clarity, not just rework.
Can AI write the spec? It can draft. You own the decisions. A spec without an owner produces code without direction. At this stage treat the AI as a stenographer, not an author.
What about specs for refactors? Slightly different shape: before / after, why, and blast radius. See the AI refactor workflow.
Isn’t this just waterfall? No. A one-page spec is a “definition of done” for one feature. Real waterfall is 30 pages with sign-off gates. SDD keeps the spec living and updates it the moment reality diverges.

Tags: #AI coding #Tutorial #Workflow

TL;DR

What the spec-to-code workflow solves

Who this is for

When to reach for it (and when not to)

Pick your tool

GitHub Spec Kit in two commands

Claude Code plan mode (no install)

The workflow, step by step

One-page spec template

Where to keep the spec so the agent sees it

First-run exercise

Quality check

FAQ

Related

Related Articles

AI Changelog Generation: From Commits to a Release Note Humans Read

AI-Assisted Database Migrations — Reversible, Backfilled, Tested

AI for Incident Postmortems Without Sanitizing the Lessons

AI Merge Conflict Resolution: When to Trust the Auto-Merge

AI On-Call Debugging: From Page to Fix Without Panic

AI PR Descriptions: From Diff to Reviewable