AI Architecture Review Workflow

Use AI to challenge your architecture decisions before you commit weeks of code.

Architecture mistakes are paid for in weeks of refactoring, not days. The cheap way to find them is to argue with someone smart before you write code — but most teams don’t have that someone available on a Tuesday morning. This walks through how to use a reasoning-grade AI as a structured devil’s advocate that catches 3-5 real issues per design doc.

What this tutorial solves

Vague “is this design good?” prompts get vague “yes, with considerations” answers. This workflow uses a forced steelman-then-attack sequence that produces specific, actionable critique — the kind a senior teammate would give if you could get on their calendar. Output: a sharpened design doc with mitigations and rejected alternatives explicitly noted.

Who this is for

Tech leads, senior engineers, and indie developers about to start a multi-week implementation. Especially useful for solo devs without a senior on call, and for new tech leads who haven’t yet built strong design-review reflexes.

When to reach for it

Before writing code on any feature involving: new data models, new services, non-trivial state management, distributed coordination, payment / auth flows, or anything where rollback would be painful. The rule of thumb: if undoing this would take more than 2 days, run the review.

When this is NOT the right tool

Trivial features. Well-trodden patterns where your team has an established way (no point asking AI to reconsider your standard CRUD endpoint pattern). Time-boxed spikes meant to be thrown away.

Before you start

  • Have a one-page design doc. It can be rough — bullet points are fine — but it must include: goal, constraints, proposed approach, alternatives you considered.
  • Pick a reasoning-heavy model: Claude Opus / Sonnet with extended thinking, or GPT-5.5 with reasoning mode. Speed-first models give superficial critique.
  • Decide what “good” looks like for THIS design. “Latency under 200ms p95” is good; “scalable” is not. AI will critique against your stated criteria, so vague criteria yield vague critique.

Step by step

  1. Write the one-page design doc with these sections: Goal (one sentence), Constraints (3-5 hard limits), Approach (your proposal), Alternatives considered (2-3 you rejected).
  2. Paste the doc into Claude or ChatGPT (reasoning model preferred). Ask: “Steelman this design. Give the 3 strongest reasons it is the right call.”
  3. Now ask: “Play devil’s advocate. Find the 5 biggest weaknesses in this design. Be specific — name the failure mode, when it would happen, and the cost when it does.”
  4. For each weakness the AI surfaces, ask: “What is a minimal mitigation that does NOT require changing the overall architecture?” This separates fixable concerns from architectural blockers.
  5. Ask: “Are there alternative architectures I should have considered? List 2, with explicit tradeoffs against my proposal.” If it lists alternatives you genuinely hadn’t thought of, that’s the workflow earning its keep.
  6. For high-stakes designs, ask: “Walk through 3 realistic failure scenarios. What state could become inconsistent? Where does retry logic get fragile?”
  7. Update the design doc with mitigations, explicitly-rejected alternatives (with reasons), and a “considered failures” section. THIS is what you show to a human reviewer.

First-run exercise

  1. Pick a design doc you already shipped — one with a known outcome. Run the review on it.
  2. Compare AI’s predicted weaknesses against what actually went wrong in production. Did it catch the real issues, or did it focus on theoretical ones?
  3. Note which prompt phrasings produced useful vs vague critique. Use that calibration on your next real design.
  4. Resist running it on your “definitely-fine” designs. The value is on the genuinely uncertain ones.

Quality check

  • Did the AI find issues you genuinely hadn’t considered, or did it restate concerns you already documented? The latter is fine but lower value.
  • Are the weaknesses verifiable? “This could be slow” is vibes; “this design issues N+1 queries when the user has more than 50 items” is testable.
  • Did the alternatives it proposed have real tradeoffs, or was it suggesting strictly-worse options to make yours look good? Strictly-worse alternatives are noise.
  • Are the mitigations actually minimal, or did the AI quietly redesign the system? Push back on creeping redesign.

How to reuse this workflow

  • Save the prompt sequence (steelman → devil’s advocate → mitigations → alternatives → failures) as a reusable Claude Project or Custom GPT.
  • Keep design + critique pairs in a doc folder. After 10 reviews, you’ll see patterns in what AI catches well (race conditions, missing failure modes) vs poorly (org / political constraints).
  • Re-run the review 3 months after launch with the actual outcomes appended. The AI’s “predicted vs actual” accuracy on your domain improves your trust calibration.

A new analytics pipeline: 1-page design → AI steelman (“batching reduces cost by 4x, simpler than streaming, your team knows the pattern”) → AI devil’s advocate (4 real issues: late events, backfill, hot partition, schema migration) → minimal mitigations for 3, redesign for 1 → 1 alternative reconsidered (streaming was worth a closer look after all) → human review gets a much sharper proposal.

Common mistakes

  • Asking “is this design good?” — you get yes-and-fluff. Use the steelman-then-attack sequence instead.
  • Letting AI redesign rather than critique. Hold it to “comment, don’t rewrite” until you’ve heard the full critique.
  • Skipping the steelman step. Without it you get one-sided takedowns that miss the design’s real strengths.
  • Treating AI critique as authority. It surfaces issues; you decide which matter, given context the AI doesn’t have.
  • Running this AFTER you’ve written the code. Sunk-cost bias will reject every critique. Do it before code.
  • Showing only the AI critique to humans. Give them the SHARPENED design with mitigations baked in — that’s the point.

Advanced tips

  • For data-model designs, ask: “Walk through 5 realistic queries against this schema. Which require joins or denormalization that aren’t in the design?”
  • For distributed systems, ask: “Where can this design partially fail? What state can become inconsistent during a partition or restart?”
  • For API designs, ask: “Generate 3 example calls that look correct but violate an assumption in the implementation.”
  • Save the full design + critique + mitigations as a Markdown doc. When you revisit the system in a year, this is gold.

FAQ

  • Which model?: Reasoning-heavy ones: Claude with extended thinking, GPT-5.5 with reasoning. Speed models (Haiku, GPT-5.4) give weaker critique.
  • Does this replace human design review?: No, it’s a pre-filter. Your senior teammate’s time goes much further on a doc that already survived AI critique.
  • What if the AI’s critique is wrong?: Often it will be; that’s fine. Wrong critique still surfaces assumptions worth documenting. Just don’t apply mitigations for issues that aren’t real.
  • How long does this take?: 20-40 minutes per design. Compared to weeks of refactoring, it’s the best ROI in your toolkit.
  • Can I skip the steelman step to save time?: Don’t. Without it, the critique skews one-sided and you’ll over-correct.

Tags: #AI coding #Tutorial #Workflow