Architecture mistakes are paid for in weeks of refactoring, not days. The cheap way to find them is to argue with someone smart before you write code — but most teams don’t have that someone available on a Tuesday morning. This walks through how to use a reasoning-grade AI as a structured devil’s advocate that catches 3-5 real issues per design doc.
What this tutorial solves
Vague “is this design good?” prompts get vague “yes, with considerations” answers. This workflow uses a forced steelman-then-attack sequence that produces specific, actionable critique — the kind a senior teammate would give if you could get on their calendar. Output: a sharpened design doc with mitigations and rejected alternatives explicitly noted.
Who this is for
Tech leads, senior engineers, and indie developers about to start a multi-week implementation. Especially useful for solo devs without a senior on call, and for new tech leads who haven’t yet built strong design-review reflexes.
When to reach for it
Before writing code on any feature involving: new data models, new services, non-trivial state management, distributed coordination, payment / auth flows, or anything where rollback would be painful. The rule of thumb: if undoing this would take more than 2 days, run the review.
When this is NOT the right tool
Trivial features. Well-trodden patterns where your team has an established way (no point asking AI to reconsider your standard CRUD endpoint pattern). Time-boxed spikes meant to be thrown away.
Before you start
- Have a one-page design doc. It can be rough — bullet points are fine — but it must include: goal, constraints, proposed approach, alternatives you considered.
- Pick a reasoning-heavy model: Claude Opus / Sonnet with extended thinking, or GPT-5.5 with reasoning mode. Speed-first models give superficial critique.
- Decide what “good” looks like for THIS design. “Latency under 200ms p95” is good; “scalable” is not. AI will critique against your stated criteria, so vague criteria yield vague critique.
Step by step
- Write the one-page design doc with these sections: Goal (one sentence), Constraints (3-5 hard limits), Approach (your proposal), Alternatives considered (2-3 you rejected).
- Paste the doc into Claude or ChatGPT (reasoning model preferred). Ask: “Steelman this design. Give the 3 strongest reasons it is the right call.”
- Now ask: “Play devil’s advocate. Find the 5 biggest weaknesses in this design. Be specific — name the failure mode, when it would happen, and the cost when it does.”
- For each weakness the AI surfaces, ask: “What is a minimal mitigation that does NOT require changing the overall architecture?” This separates fixable concerns from architectural blockers.
- Ask: “Are there alternative architectures I should have considered? List 2, with explicit tradeoffs against my proposal.” If it lists alternatives you genuinely hadn’t thought of, that’s the workflow earning its keep.
- For high-stakes designs, ask: “Walk through 3 realistic failure scenarios. What state could become inconsistent? Where does retry logic get fragile?”
- Update the design doc with mitigations, explicitly-rejected alternatives (with reasons), and a “considered failures” section. THIS is what you show to a human reviewer.
First-run exercise
- Pick a design doc you already shipped — one with a known outcome. Run the review on it.
- Compare AI’s predicted weaknesses against what actually went wrong in production. Did it catch the real issues, or did it focus on theoretical ones?
- Note which prompt phrasings produced useful vs vague critique. Use that calibration on your next real design.
- Resist running it on your “definitely-fine” designs. The value is on the genuinely uncertain ones.
Quality check
- Did the AI find issues you genuinely hadn’t considered, or did it restate concerns you already documented? The latter is fine but lower value.
- Are the weaknesses verifiable? “This could be slow” is vibes; “this design issues N+1 queries when the user has more than 50 items” is testable.
- Did the alternatives it proposed have real tradeoffs, or was it suggesting strictly-worse options to make yours look good? Strictly-worse alternatives are noise.
- Are the mitigations actually minimal, or did the AI quietly redesign the system? Push back on creeping redesign.
How to reuse this workflow
- Save the prompt sequence (steelman → devil’s advocate → mitigations → alternatives → failures) as a reusable Claude Project or Custom GPT.
- Keep design + critique pairs in a doc folder. After 10 reviews, you’ll see patterns in what AI catches well (race conditions, missing failure modes) vs poorly (org / political constraints).
- Re-run the review 3 months after launch with the actual outcomes appended. The AI’s “predicted vs actual” accuracy on your domain improves your trust calibration.
Recommended workflow
A new analytics pipeline: 1-page design → AI steelman (“batching reduces cost by 4x, simpler than streaming, your team knows the pattern”) → AI devil’s advocate (4 real issues: late events, backfill, hot partition, schema migration) → minimal mitigations for 3, redesign for 1 → 1 alternative reconsidered (streaming was worth a closer look after all) → human review gets a much sharper proposal.
Common mistakes
- Asking “is this design good?” — you get yes-and-fluff. Use the steelman-then-attack sequence instead.
- Letting AI redesign rather than critique. Hold it to “comment, don’t rewrite” until you’ve heard the full critique.
- Skipping the steelman step. Without it you get one-sided takedowns that miss the design’s real strengths.
- Treating AI critique as authority. It surfaces issues; you decide which matter, given context the AI doesn’t have.
- Running this AFTER you’ve written the code. Sunk-cost bias will reject every critique. Do it before code.
- Showing only the AI critique to humans. Give them the SHARPENED design with mitigations baked in — that’s the point.
Advanced tips
- For data-model designs, ask: “Walk through 5 realistic queries against this schema. Which require joins or denormalization that aren’t in the design?”
- For distributed systems, ask: “Where can this design partially fail? What state can become inconsistent during a partition or restart?”
- For API designs, ask: “Generate 3 example calls that look correct but violate an assumption in the implementation.”
- Save the full design + critique + mitigations as a Markdown doc. When you revisit the system in a year, this is gold.
FAQ
- Which model?: Reasoning-heavy ones: Claude with extended thinking, GPT-5.5 with reasoning. Speed models (Haiku, GPT-5.4) give weaker critique.
- Does this replace human design review?: No, it’s a pre-filter. Your senior teammate’s time goes much further on a doc that already survived AI critique.
- What if the AI’s critique is wrong?: Often it will be; that’s fine. Wrong critique still surfaces assumptions worth documenting. Just don’t apply mitigations for issues that aren’t real.
- How long does this take?: 20-40 minutes per design. Compared to weeks of refactoring, it’s the best ROI in your toolkit.
- Can I skip the steelman step to save time?: Don’t. Without it, the critique skews one-sided and you’ll over-correct.
Related
- AI agent code review workflow
- App audit prompt workflow
- Feed project reports to agents
- AI spec-to-code workflow
- Agent vs autocomplete
Tags: #AI coding #Tutorial #Workflow