Which model should I run these on as of June 2026?

For an agentic whole-repo audit, Claude Code (Opus 4.7 / Sonnet 4.6, 1M context) reads files itself and handles the "cite the import" constraint well. Gemini 3.1 Pro (1M context, ~$2/1M input tokens) is the cheapest way to ingest a very large monorepo in one pass. ChatGPT Plus (GPT-5.5) holds roughly 320 pages in-app, so paste a single module rather than the whole repo unless you're on the $200 Pro tier.

Should I run architecture review on every PR?

No — too noisy. Run before refactors, on inheritance, and quarterly. Use PR review prompts for diffs.

What if my codebase has no explicit layers?

Have AI infer the intended layer model from folder names first, then judge violations. Document the inferred model as an ADR.

Will AI hallucinate cycles?

Sometimes. Always require the exact import (file:line) for each edge — if the model can't produce it, the cycle is fake. Better: run `madge --circular` or `dependency-cruiser` first and paste the real cycle list.

How is this different from a full repo audit?

Full repo audits scan many dimensions shallowly (security, deps, tests). Architecture review goes deep on structural questions only.

Can I run this on microservices spread across repos?

Yes — paste service interface definitions and message contracts. The boundary audit (template 14) works best when you can supply both sides.

How long should this take?

For a 30K-LOC repo: 20-40 minutes per dimension, then 1-2 hours to triage and backfill ADRs.

Prompt Library

Architecture Review Prompts for Layer and Dependency Audits

15 architecture review prompts that surface real layering bugs, dependency cycles, and boundary leaks — with file:line evidence, not generic "consider DDD" advice.

Published: May 18, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

“Review my architecture” yields textbook patterns that don’t map to your code. A good architecture review prompt names the dimension (layering, deps, boundaries, data flow), demands file:line evidence, and forbids rewrite suggestions in the same pass. The 15 templates below each interrogate a different structural angle.

TL;DR

Never ask for a generic “architecture review.” Pick one dimension per prompt and demand file:line evidence on every finding — fake cycles and hallucinated layers evaporate when the model has to cite the exact import.
Forbid rewrites inside the review pass. A prompt that diagnoses and patches in one breath produces neither a clean diagnosis nor a safe patch.
Run whole-repo reviews on a 1M-token model. As of June 2026, Claude Opus 4.7 and Gemini 3.1 Pro both ingest 1M tokens; Claude Code reads the repo directly, so it’s the lowest-friction option for an agentic audit.
Confirm structural findings with a deterministic tool. Pair the AI cycle/boundary prompts with madge or dependency-cruiser so a real graph backs the model’s claims.
End every audit with template 15 (ADR backfill). Without it, the findings become read-only documents nobody acts on.

Who this is for

Tech leads doing architecture audits before a refactor, staff engineers reviewing a junior team’s design, founders preparing diligence, anyone inheriting an unfamiliar codebase.

When not to use these prompts

Skip these for greenfield design — use design-doc prompts instead. Also skip for sub-200-LOC scripts where “architecture” is just one file.

Which model and tool to run these on (June 2026)

These prompts are model-agnostic, but whole-repo structural review is gated by context size and how the tool reads files. Quick guide:

Setup	Context (as of June 2026)	Best for	Notes
Claude Code (Opus 4.7 / Sonnet 4.6)	1M tokens	Agentic audits where the model reads the repo itself	Bundled with Claude Pro ($20/mo). Lowest friction — no manual paste; cite-the-import constraint works well
Gemini 3.1 Pro	1M tokens	Ingesting very large monorepos in one pass at low cost	API input ~$2/1M tokens; strong at tracing flow across many files
ChatGPT Plus (GPT-5.5)	~320 pages in-app (full 1M only on $200 Pro)	Single-module or pasted-tree reviews	Paste the module tree; less suited to whole-repo unless on Pro

For chat tools (no repo access), paste the package/module tree and the specific files in scope rather than asking the model to imagine them. For an agentic loop, Claude Code reads files directly, so prompt 1’s “Claude Code auto-reads” note applies. Don’t trust a model-reported dependency cycle on its own — confirm it with a deterministic graph tool such as madge or dependency-cruiser, then feed the real cycle list back into the prompt for break-strategy advice.

Prompt anatomy / structure formula

Every architecture review prompt should carry six elements:

Role: who the AI plays (architect / SRE / QA lead / release captain).
Context: repo / framework / runtime versions / files or diff in scope.
Goal: one concrete deliverable — review notes, plan, checklist, test file, handoff doc.
Constraints: what AI MUST NOT do (don’t rewrite, don’t auto-format, don’t guess versions).
Output format: numbered findings, markdown table, JSON, unified diff, or runnable code.
Examples / signal: 1-2 examples of “good” output, or what bad output looks like.

Best for

Pre-refactor structural assessment
Inheritance audit for unfamiliar codebases
Architecture decision record (ADR) backfill
Quarterly tech-debt review
Diligence and acquisition reviews

15 copy-ready prompt templates

1. Layering violation hunt

Run first — most architecture rot starts here.

You are a staff engineer reviewing this codebase for LAYERING violations only. Identify imports that cross layers in the wrong direction (UI importing infrastructure, domain importing framework, etc.). For each: file:line, the violating import, the layer rule it breaks, and a one-line refactor sketch. Do not propose rewrites. Return a markdown table with severity (Critical / High / Med).

Variables to swap: repo files (Claude Code auto-reads); for chat tools paste the package/module tree

Optimization: Append: “If layers are not explicit, first infer the intended layer model from folder names, then judge violations against that inferred model.”

2. Dependency cycle detection

Scan this repo for circular dependencies between modules / packages. Output: (1) cycle as `A -> B -> C -> A`, (2) the import that creates each edge (file:line), (3) which edge is the cheapest to break, (4) one-line break strategy (interface, event, move type). Ignore intra-file cycles.

Optimization: Run madge --circular src/ first and paste its output — the model then explains and ranks real cycles instead of guessing them.

3. Module boundary leak audit

Audit MODULE BOUNDARIES. For each top-level module, list: (1) what types it exports (public API), (2) types that leak through but are internal (e.g., DB rows surfacing in HTTP handlers), (3) internal types reached via deep imports `module/internal/...`. Flag every boundary leak with file:line.

4. God-object / hub-module detection

Find HUB modules: any module with > 5 incoming imports OR > 10 outgoing imports. For each: list the fan-in/out count, the responsibilities tangled inside, and propose a split into 2-3 cohesive submodules. Do not write the refactor — only the split plan.

5. Data flow surprise audit

Trace DATA FLOW for the 3 most important entities (infer from naming if not told). For each: (1) where it is created, (2) where it is mutated, (3) where it is read across module boundaries. Flag any "surprise mutation" — state changing in a module that the entity doesn't belong to.

Variables to swap: entity names (optional — model can infer)

6. Hexagonal / ports-and-adapters check

Evaluate this codebase against ports-and-adapters: (1) Is domain logic isolated from frameworks? Cite evidence, (2) Are external systems (DB, queue, HTTP) reached through interfaces or directly? List each direct call (file:line), (3) Where would mocking be hard right now? Rate adherence 1-5 with rationale.

7. Bounded-context drift audit

Identify implicit bounded contexts in this codebase (group modules by entities they share). Then: (1) Which contexts share the same entity but disagree on its shape? (file:line for each), (2) Which contexts secretly depend on each other through shared mutable state? (3) Suggest one context boundary to make explicit first.

8. Cross-cutting concern leak

Audit cross-cutting concerns: logging, auth, telemetry, feature flags, error handling. For each: (1) Is it implemented centrally or sprinkled? (2) List 5 sites where the concern is reimplemented inline, (3) Suggest one extraction strategy (decorator, middleware, hook). Do not perform the extraction.

9. Shared kernel risk audit

Identify the "shared kernel" — code imported by > 3 modules. For each shared item: (1) Why does it need to be shared? (2) Is the shape stable, or does it change every sprint? (3) Score coupling risk (Low / Med / High). Flag shared kernel items that are actually leaky abstractions.

10. Async / sync boundary audit

Map ASYNC vs SYNC boundaries. Find: (1) sync code that blocks on async (sync over async) — file:line, (2) async code that swallows promise rejection, (3) "fire and forget" calls without retry / dead-letter, (4) mixed paradigms in the same call chain. Output: severity-ranked table.

11. Configuration architecture audit

Audit how CONFIG flows: (1) Where are env vars read? Centralized or scattered? List read sites, (2) Are defaults / fallbacks documented? (3) Is there a single typed config object, or is `process.env.X` reached directly? (4) Suggest a config-loading pattern for this stack.

12. Plugin / extension surface audit

If this codebase exposes a plugin or extension surface, audit: (1) What contract do extensions implement? (2) What internals are accidentally reachable? (3) How is versioning handled? If no extension surface exists, say so — do not invent one.

13. Read / write asymmetry audit (CQRS lite)

For the 3 most-used entities, separate READ paths from WRITE paths. Find: (1) Reads that pull through write models unnecessarily, (2) Writes that bypass invariant enforcement, (3) Queries that join 4+ tables (candidates for read models). Suggest one read/write split worth doing first.

14. Multi-service boundary audit

Use when the repo is a monorepo with multiple deployables.

This monorepo contains [services]. Audit the SERVICE boundaries: (1) Which packages are imported across service lines? (2) Where do contract types diverge between services? (3) Is there shared DB access (anti-pattern)? (4) Suggest one boundary to harden first.

Variables to swap: [services] — e.g., “web (Next.js), api (FastAPI), worker (Go)“

15. Architecture findings → ADR backfill

Run last — converts findings into decision records.

Take the architecture findings from previous prompts. For each significant decision implied (or contradicted) by the code, draft a short ADR: Title, Status (Accepted / Proposed / Deprecated), Context (2 sentences), Decision (1 sentence), Consequences (3 bullets). Output 5 ADRs maximum, ranked by impact.

Common mistakes

Asking “review my architecture” without naming a dimension — you get textbook patterns, not your bugs.
Letting AI propose a rewrite in the same pass as the review — diagnostic clarity collapses.
No file:line evidence required — every finding is unverifiable.
Inferring layers from titles instead of imports — judge from the actual dependency graph.
Reviewing the whole repo in one prompt on a 320-page context — split by module or move to a 1M-token model.
Trusting a model-reported cycle without a madge/dependency-cruiser cross-check.
Treating AI output as final — pair with one human walk-through of the top 3 findings.
Skipping ADR backfill — the findings become read-only documents nobody acts on.

How to push results further

Run each architecture dimension in a separate thread — layering, cycles, boundaries, data flow.
Demand file:line evidence on every finding. Hallucinations evaporate when evidence is required.
Add a severity enum (Critical / High / Med / Low) in your prompt to force ranking.
Ask AI to first infer the intended layer model from folders, then judge violations against it.
For monorepos, run boundary audits per service-pair, not globally.
Save architecture audits as dated markdown in /docs/architecture/audits/ so you can diff over time.
Pair every audit with template 15 (ADR backfill) — without it, findings rot.

FAQ

Which model should I run these on as of June 2026?: For an agentic whole-repo audit, Claude Code (Opus 4.7 / Sonnet 4.6, 1M context) reads files itself and handles the “cite the import” constraint well. Gemini 3.1 Pro (1M context, ~$2/1M input tokens) is the cheapest way to ingest a very large monorepo in one pass. ChatGPT Plus (GPT-5.5) holds roughly 320 pages in-app, so paste a single module rather than the whole repo unless you’re on the $200 Pro tier.
Should I run architecture review on every PR?: No — too noisy. Run before refactors, on inheritance, and quarterly. Use PR review prompts for diffs.
What if my codebase has no explicit layers?: Have AI infer the intended layer model from folder names first, then judge violations. Document the inferred model as an ADR.
Will AI hallucinate cycles?: Sometimes. Always require the exact import (file:line) for each edge — if the model can’t produce it, the cycle is fake. Better: run madge --circular or dependency-cruiser first and paste the real cycle list.
How is this different from a full repo audit?: Full repo audits scan many dimensions shallowly (security, deps, tests). Architecture review goes deep on structural questions only.
Can I run this on microservices spread across repos?: Yes — paste service interface definitions and message contracts. The boundary audit (template 14) works best when you can supply both sides.
How long should this take?: For a 30K-LOC repo: 20-40 minutes per dimension, then 1-2 hours to triage and backfill ADRs.

Tags: #Prompt #Coding #Claude Code #Audit #Architecture