Should I run this on every PR?

No — too noisy. Run quarterly, before launches, on inheritance, or before major refactors. For per-PR checks, lean on Claude Code's `/security-review` and the agent-based PR reviewer instead.

Yes. Treat the audit as 80% — pair with a dependency CVE scanner, lint rules, and at least one human review pass.

Can I use these on closed-source repos?

Only with private deployment — Claude Code via your own cloud/VPC, Codex in its sandbox on a private org, or models through AWS Bedrock / Google Vertex. Never paste closed-source into a public chat.

How long should an audit take?

For a 50K-LOC repo: ~30 min to run, 1-2 hours to triage findings. With 1M-context Claude Code reading the tree directly, the run phase is faster; the triage time barely moves.

Which model should I pick for the deepest audit?

As of June 2026, Opus 4.7 leads on hard code reasoning (SWE-bench Verified 87.6%, SWE-bench Pro 64.3%); GPT-5.5 (58.6% Pro) is strong when you want the agent to run tests in a sandbox. Use Sonnet 4.6 for cheaper bulk passes.

What if findings conflict with my architecture decisions?

Flag them as `out-of-scope` rather than ignoring — note the rationale so the next audit doesn't re-flag.

How do I keep audits actionable?

Run template 15 (action-ticket conversion) — without it, audits become read-only documents.

Prompt Library

Full Repository Audit Prompts: 15 Templates for Whole-Project Review

Whole-repo audit prompts for Claude Code, Codex, and Cursor — architecture smells, dead code, security risks, dependency drift, and test gaps in one structured pass.

Published: May 18, 2026 Updated: Jun 09, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

A whole-repo audit isn’t “tell me what’s wrong” — that prompt yields generic advice. A good audit prompt names the dimensions (architecture, security, test coverage, deps, perf, docs), forces evidence (file:line), and constrains output (markdown table or numbered list with severity). The 15 templates below cover the audit angles your repo actually needs, with model-version notes current as of June 2026.

TL;DR

Run audits one dimension per thread — merging architecture, security, and perf into one prompt dilutes every finding.
Always demand file:line evidence. No location string = treat the finding as a hallucination.
Whole-repo reading is now practical: Claude Opus 4.7 and Sonnet 4.6 ship a 1M-token context (GA March 2026, no long-context price premium), so Claude Code can hold thousands of source files at once. Codex on GPT-5.5 carries ~400K tokens.
For security specifically, Claude Code now has a built-in /security-review command (terminal + GitHub Action) — use template 6 to extend it, not replace it.
Finish every audit with template 15 so findings become tickets, not a read-only document.

Which tool reads a whole repo (as of June 2026)

Tool	Default model	Context window	How it reads the repo
Claude Code	Opus 4.7 / Sonnet 4.6	1M tokens (GA)	`Read`/`Grep`/`Glob` agent picks files; can dispatch parallel review subagents
Codex CLI / cloud	GPT-5.5	~400K tokens	Clones repo into a sandbox, reasons over full tree + deps, runs tests
Cursor (Agent)	Sonnet 4.6 / GPT-5.5 / Gemini 3.1 Pro	model-dependent	Interactive; indexes the workspace, you approve edits live
Gemini (AI Pro/Ultra)	Gemini 3.1 Pro	1M tokens	Paste or connect repo; strong on long single-file reasoning

Claude Code is the most hands-off for read-only audits because the agent fetches files itself — you don’t paste anything. Codex is best when you also want it to run the test suite to confirm findings. See our Claude Code execution prompts for the run loop and code review prompts for diff-level passes.

Who this is for

Tech leads doing onboarding audits, founders preparing for due-diligence, indie devs before launch, senior engineers inheriting a codebase.

When not to use these prompts

Skip these for tiny scripts (under 500 LOC) — the overhead is bigger than the payoff. And don’t paste closed-source repos into a public chat; run a private deployment instead (see the FAQ).

Prompt anatomy / structure formula

A whole-repo audit prompt should always carry six elements:

Role: who the AI plays (senior reviewer / SRE / staff engineer).
Context: repo / framework / runtime versions / files in scope.
Goal: one concrete deliverable — review notes, diff, plan, checklist.
Constraints: things AI MUST NOT do (don’t touch X, don’t silently rename, don’t auto-format).
Output format: numbered findings, markdown table, JSON, or unified diff.
Examples / signal: 1-2 examples of “good” output, or what bad output looks like.

Best for

Pre-launch hardening sweep
Onboarding audit when inheriting a codebase
Quarterly tech-debt review
Due-diligence preparation
Pre-refactor baseline assessment

15 copy-ready prompt templates

1. Whole-repo health snapshot

Best as the first pass when you’ve just opened a strange repo.

You are a staff engineer doing a 30-minute audit of this repository. Produce a 1-page report with these sections: (1) Stack & framework summary in 3 sentences, (2) Three architecture smells you can spot with evidence (file:line), (3) Three security risks (auth / data / secret handling), (4) Test coverage signal (yes/no per top-level dir), (5) Top 5 follow-ups ranked by impact / effort. Do not propose rewrites — only diagnose.

Variables to swap: repo files (Claude Code reads automatically) — none needed unless using a chat tool

Optimization: If the model wants to dive too deep, add: “Skip implementations. We are mapping the territory, not refactoring.”

2. Architecture-only audit

Audit this repo for ARCHITECTURE only. Ignore style / naming. Report: (1) Top 3 layering violations with file:line, (2) Modules with > 5 incoming deps (god-objects), (3) Any "data flow surprise" where state mutates across module boundaries, (4) One paragraph: "If I had to redraw the boxes-and-arrows, here is the cleaner version."

3. Dependency drift audit

Read package.json (and lockfile if present). Report: (1) Direct deps that are > 2 major versions behind latest, (2) Any deprecated / abandoned packages, (3) Duplicate logical deps (e.g., axios + fetch wrapper + got), (4) Native bindings or post-install scripts that warrant attention, (5) Upgrade roadmap: which to bump now / next sprint / never.

Optimization: Pair with: “Mark any of these that have known CVEs against the version we pin.” For verified CVE matches, cross-check against the GitHub Advisory Database — model recall on exact CVE IDs is unreliable.

4. Dead-code & orphan audit

Find dead and orphaned code: (1) exported functions / components that are never imported, (2) routes / pages that are unreachable from the main router, (3) env vars referenced in code but never set, (4) feature flags that have been "on" for > 6 months. Return a table: kind | path | evidence | safe to delete? (yes/no/maybe).

5. Test coverage qualitative audit

Don't run coverage tools. Instead, do a qualitative test audit: (1) Which critical paths have ZERO tests? Name them, (2) Which existing tests are tautological (testing mocks of mocks)? File:line, (3) Where is the test pyramid inverted (too many e2e, too few unit)? (4) Suggest 5 highest-ROI tests to write next.

6. Security risk audit

Audit only for SECURITY: (1) Unvalidated user input reaching DB / shell / template / fetch, (2) Secrets in code or in .env.example, (3) AuthN/AuthZ gaps — any route lacking auth middleware? Any role check missing? (4) Logging that leaks PII / tokens, (5) CORS / CSRF posture. For each finding: file:line, severity (Critical / High / Med), one-line fix sketch.

Optimization: In Claude Code, run the built-in /security-review command first (it ships since March 2026 and covers SQL injection, XSS, auth flaws, and insecure data handling), then use this template to widen scope to logging and CORS/CSRF that the default command under-weights.

7. Performance hot-spot audit

Audit for PERFORMANCE without running benchmarks. Find: (1) N+1 patterns in DB calls (file:line), (2) Synchronous I/O in hot paths, (3) Missing caches where the same fetch repeats across requests, (4) Bundle bloat suspects (large deps imported eagerly), (5) Re-render storms (React only): components missing memo / unstable deps in useEffect / context churn.

8. Documentation audit

Audit project docs: (1) Does README explain "what + why" or just commands? (2) Is the run-locally path actually current? (3) Are public functions / exported types missing TSDoc / docstrings? List 10 worst offenders, (4) Are env vars documented? (5) Suggest 5 doc sections that would make onboarding 50% faster.

9. Type-safety audit (TS / Python typing)

Audit for type-safety: (1) Count `any` / `as unknown as` / `// @ts-ignore` (or Python `# type: ignore`), (2) List API boundary types that come from `any`, (3) Functions with > 4 args missing a typed args-object, (4) Type definitions duplicated across the repo. Return file:line evidence.

10. Error-handling audit

Audit ERROR HANDLING only: (1) try/catch blocks that swallow errors silently, (2) `catch (e) {}` empty handlers, (3) Promise chains missing `.catch`, (4) API routes that 500 instead of returning typed errors, (5) Background jobs lacking retry / dead-letter. Each finding: file:line + one-line fix sketch.

11. Database & schema audit

Audit DB code: (1) Tables without explicit indexes on FKs, (2) Migrations that drop / rename columns without backfill, (3) ORM `.findAll()` without limits, (4) Transactions missing for multi-row writes, (5) Soft-delete columns referenced unevenly. Return findings with file:line and severity.

12. Logging & observability audit

Audit OBSERVABILITY: (1) Any service-critical path with zero logs? Name it, (2) Logs that include PII or secrets, (3) Inconsistent log shape (some JSON, some console.log), (4) Metrics / counters missing on auth-fail / payment-fail / external-API calls, (5) Trace propagation gaps. Suggest the 5 highest-ROI log/metric additions.

13. Build & tooling audit

Audit BUILD / TOOLING: (1) Steps that take > 60s and can be cached, (2) Lint config inconsistencies between root and packages, (3) CI jobs that never fail (warning-only that should be error), (4) Pre-commit hooks that are skipped via --no-verify in scripts, (5) Node / Python / Go version mismatch between local / CI / Docker.

14. Cross-language repo audit

This repo mixes [languages]. Audit cross-language boundaries: (1) Where do TS / Python (or whichever pair) types diverge? List schemas, (2) Are message contracts versioned? (3) JSON keys casing inconsistencies (camelCase vs snake_case)? (4) Build-order dependencies between sub-packages.

Variables to swap: [languages] — e.g., “Next.js + Python FastAPI + Go workers”

15. Repo audit → action ticket list

Run last; converts findings into tickets.

Take all audit findings above and turn them into a prioritized ticket list. For each: (1) Title, (2) One-paragraph description, (3) Acceptance criteria (3 bullets), (4) Estimated effort (S / M / L), (5) Risk if not done. Group by: Now (this sprint) / Next (next quarter) / Later. Output as markdown table.

Common mistakes

Asking “what’s wrong with this repo?” without naming dimensions — output is generic.
No output format constraint — you get prose, not a triage list.
Letting AI propose rewrites in the same pass as the audit — you lose the diagnostic clarity.
No severity scale — every finding looks equally urgent.
Forgetting to ask for file:line evidence — you can’t verify the claims.
Doing all dimensions in one prompt — output dilutes; better to run dimension-by-dimension.
Re-running the audit after each fix — instead, fix in batches, then re-audit.

How to push results further

Run audits in separate threads by dimension. Don’t merge architecture + security + perf into one prompt — context dilutes findings.
Always demand file:line evidence. If the model can’t provide it, the finding is hallucinated.
Add a severity enum (Critical / High / Med / Low) in your prompt — forces ranking.
For long repos, ask AI to first list directories it considers risky, then audit those. Cuts noise.
Use Claude Code’s Read over fed-in file dumps — the agent picks files as needed, and at 1M tokens it can hold the whole tree.
On Claude Code Team/Enterprise, the agent-based PR review (research preview, shipped April 2026) dispatches parallel reviewers and verifies findings to cut false positives — wire your audit dimensions into its config.
Save the audit as a markdown file in /docs/audits/ and date it — next audit can diff against it.
Pair an architecture audit with a non-coder explainer prompt: “Summarize for a PM in 8 sentences.” Spot-checks clarity.

FAQ

Should I run this on every PR?: No — too noisy. Run quarterly, before launches, on inheritance, or before major refactors. For per-PR checks, lean on Claude Code’s /security-review and the agent-based PR reviewer instead.
Will AI miss things?: Yes. Treat the audit as 80% — pair with a dependency CVE scanner, lint rules, and at least one human review pass.
Can I use these on closed-source repos?: Only with private deployment — Claude Code via your own cloud/VPC, Codex in its sandbox on a private org, or models through AWS Bedrock / Google Vertex. Never paste closed-source into a public chat.
How long should an audit take?: For a 50K-LOC repo: ~30 min to run, 1-2 hours to triage findings. With 1M-context Claude Code reading the tree directly, the run phase is faster; the triage time barely moves.
Which model should I pick for the deepest audit?: As of June 2026, Opus 4.7 leads on hard code reasoning (SWE-bench Verified 87.6%, SWE-bench Pro 64.3%); GPT-5.5 (58.6% Pro) is strong when you want the agent to run tests in a sandbox. Use Sonnet 4.6 for cheaper bulk passes.
What if findings conflict with my architecture decisions?: Flag them as out-of-scope rather than ignoring — note the rationale so the next audit doesn’t re-flag.
How do I keep audits actionable?: Run template 15 (action-ticket conversion) — without it, audits become read-only documents.

Tags: #Prompt #Coding #Code review #Audit #Claude Code