Does this replace a real security audit?

No. It is the pre-flight that makes the real audit faster and cheaper. SOC 2 and PCI need humans, and so do business-logic access checks.

Which model should I use?

For a whole-repo audit, use a tool that reads the full tree — Claude Code (Opus 4.7, 1M-token context) or Cursor. For a single feature, pasting files into ChatGPT Plus (GPT-5.5) or Claude is enough.

How do I get the AI to push harder?

Add "Be brutal — review this as a senior engineer who has caught this team being sloppy before." It surfaces things a diplomatic prompt hides.

What about framework-specific issues like CSRF?

Name your stack: "This is Next.js with App Router; check for CSRF in Server Actions specifically." Targeted prompts get targeted findings.

Can I run this in CI?

Yes, with caveats. Pipe changed files to the model on each PR and gate on "no new blockers." Cursor's BugBot automates a similar per-PR review. Always keep a human override on severity.

Three audits on a small app run roughly $1–5 in API calls (Opus 4.7 is $5/$25 per 1M input/output tokens as of June 2026). Cheaper than fixing the same issue after launch.

AI Tool Tutorials

App Audit Prompt Workflow: 3 AI Audits Before You Ship

A 30-minute AI audit before launch: run three focused prompts (security, performance, UX) against the OWASP Top 10:2025 and get a triaged, diff-ready fix list.

Published: May 17, 2026 Updated: Jun 05, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Solo developers and small teams rarely have a dedicated security or performance reviewer, so issues compound silently until launch day exposes all of them at once. A focused 30-minute AI audit before any meaningful release catches most of the obvious gotchas — wrong CORS settings, leaked env vars, N+1 queries, missing rate limits, keyboard traps — and hands you a prioritized, diff-ready fix list. This is the structured prompt workflow that produces actionable findings instead of a generic checklist.

TL;DR

Run three separate audits, not one mega-prompt: security, performance, UX/accessibility. Quality degrades when you bundle them.
Anchor the security audit to the OWASP Top 10:2025 (the 8th edition, the current reference in 2026). Broken Access Control is still #1, and Security Misconfiguration jumped from #5 to #2.
An AI pass is a pre-flight, not a real audit. Automated accessibility scanners fully cover only about 29.5% of WCAG 2.2 success criteria; the rest needs human eyes. Treat AI security findings the same way.
Paste real context (project tree, package.json, deploy config, key files). Thin context produces generic advice; the fix is more context, not a cleverer prompt.
Demand diffs, not prose. Re-audit after fixes to confirm you addressed root causes, not symptoms.

What this covers

A repeatable audit you run pre-launch or quarterly: paste your project structure plus key files, run three focused audits, and leave with a triaged fix list. It is stack-agnostic — the examples assume a typical Astro/Next.js + Firebase stack, but the prompts adapt to any modern app.

Who this is for

Indie app developers and small product teams, especially if you have no dedicated SRE or security review and you have shipped a project or two where you wished you had caught an issue earlier. It is less relevant if you already have formal SOC 2 / penetration testing in place — this is the practitioner’s pre-flight, not a replacement for a real audit.

When to reach for it

Before any non-trivial launch (a new public endpoint, a new auth flow, a new payment integration). After any major dependency upgrade, which matters more than ever now that Software Supply Chain Failures is its own OWASP category (A03:2025). Quarterly as a health check even when nothing changed. And before you submit for AdSense or Google approval, since reviewers check specific UX and accessibility patterns the AI flags reliably.

Pick the right model for the audit

Audits are a context problem: the model has to hold your whole project in its head to spot cross-file issues. As of June 2026, the tools worth using:

Tool	Context window	Best for	Notes
Claude Code (Opus 4.7 / Sonnet 4.6)	1M tokens	Whole-repo security + architecture audits	Runs Anthropic models only; reads your repo directly
Cursor (Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro)	up to ~1M	In-IDE audits; BugBot for per-PR review	Multi-model picker; good for fix-then-recheck loops
ChatGPT Plus (GPT-5.5)	~320 pages in-app	Pasting a focused subset of files	Full 1M context only on the $200 Pro tier
Gemini 3.1 Pro (Google AI Pro, $19.99/mo)	1M tokens	Large monorepos, long config dumps	Strong on long-context recall

For a full-repo audit, prefer a tool that can read the whole tree (Claude Code or Cursor). For a single-feature audit, pasting the relevant files into ChatGPT or Claude is fine.

Before you start

Have your project structure (tree output or ls -R src/), package.json, and any deploy config (firebase.json, vercel.json) ready to paste.
State your stack explicitly: framework, hosting, database, auth provider, payment. The AI infers some of this, but it misses what you do not state.
Decide on scope: the full app, or one feature (a new payment flow, new auth, new admin panel). Targeted audits produce sharper findings.
Set up a triage doc (a Google Doc, Notion page, or a Markdown file) where each finding lands with a priority and an owner.
Run npm audit first and paste the output. It checks your lockfile against the GitHub Advisory Database for known CVEs, which gives the AI a real baseline to reason about instead of guessing.

Step by step

Provide context. Paste your project structure, package.json, and deploy config. Add a two-sentence description of what the app does and who uses it.
Run the security audit:

Audit this project against the OWASP Top 10:2025. For each item, say
PASS, FAIL, or N/A and explain why. Check:
- Broken access control (A01): protected routes actually protected,
  role checks consistent, IDOR, server-side request forgery (SSRF)
- Security misconfiguration (A02): default creds, verbose errors in
  prod, open CORS, missing security headers
- Supply chain (A03): unpinned deps, packages flagged by npm audit
- Cryptographic failures: secrets in client bundle or committed to
  repo, weak hashing, plaintext PII
- Injection: XSS, SQL injection, command injection, file-upload limits
- Authentication failures: weak session handling, missing rate limits
  on login, no brute-force protection
- Logging failures: secrets or PII in logs, no audit trail on
  sensitive actions

For each finding: severity (block/warn/nit), exact file + line, and the
fix as a diff.

Run the performance audit:

Audit this project for performance issues. Check:
- Database: N+1 queries, missing indexes, large unbounded reads
- Bundle size: unnecessary deps, code-split opportunities
- Image handling: unoptimized formats, no lazy loading, no width/height
- API calls: request waterfalls vs parallel, missing caching
- Rendering: server-side vs client-side misuse, hydration cost
- Cold-start risks on serverless functions

For each finding: severity, location, fix as a diff, and expected impact
(e.g. "cuts LCP by ~400ms", "removes ~80KB from the bundle").

Run the UX/accessibility audit:

Audit this project for accessibility (WCAG 2.2 AA) and UX dark patterns.
Be explicit that you can only catch the automatable subset; flag what
needs manual review. Check:
- Forms: labels tied to inputs, error messaging, validation timing
- Loading states: missing skeletons, layout shift (CLS)
- Empty + error states: helpful copy vs blank screen vs stack trace
- A11y: alt text presence, color contrast (4.5:1 text), ARIA misuse,
  keyboard navigation, visible focus, focus traps
- Dark patterns: confirm-shaming, hidden costs, hard-to-cancel flows

For each finding: severity, location, and fix as a diff. List separately
which WCAG criteria you could NOT evaluate from code alone.

Triage to a fix list. For each finding, ask “Show me the exact diff that fixes this.” Reject prose answers; demand code.
Prioritize. Blockers first, then warnings, then nits. Cap each session at about 10 items — you will fatigue past that.
Re-audit after fixes. Run the same prompts on the patched code. A few new findings means the AI is genuinely reading your code; zero new findings on a real codebase usually means it is pattern-matching, so add more context.

Why three separate audits

Bundling all three into one prompt is the most common mistake, and it degrades every result. A security review needs an adversarial mindset; a performance review needs a profiling mindset; an accessibility review needs an end-user mindset. Asking for all three at once forces the model to average across them, and the output reads like a generic checklist. Run them separately, triage each set, then move on.

What the AI will and will not catch

Set expectations honestly. The security audit maps cleanly to the OWASP Top 10:2025, where Broken Access Control alone covers 40 distinct CWEs and appears in 3.73% of tested applications — the highest of any category. AI is good at spotting the mechanical patterns (an unprotected route, an unpinned dependency, a missing header). It is weak at business-logic flaws: whether a given user should be able to reach a record is something only you fully know.

Accessibility is the clearest example of the gap. Automated tooling (the axe-core engine behind Lighthouse) fully automates only about 29.5% of WCAG 2.2 success criteria; another ~10% is partly covered, and roughly 60% requires manual testing. Focus order, logical reading sequence, and whether alt text is meaningful are effectively 100% manual. So treat the AI’s accessibility pass as a first sweep, then keyboard-test the real app.

Sample findings to expect

A public Firebase config object that includes the database URL — that is fine (it is meant to be public), but the AI will flag it. Verify your Security Rules are actually tight.
A useEffect with no dependency array — runs on every render. Sometimes intentional, often flagged as a performance concern.
Generic error toasts (“Something went wrong”) — flagged as a UX problem; expand them to something actionable.
Missing rel="noopener noreferrer" on target="_blank" links — common, flagged as a minor security/perf issue.
dangerouslySetInnerHTML without sanitization — a real blocker (stored XSS).
An unpinned or transitive dependency flagged by npm audit — under OWASP A03:2025, this is no longer a nit; pin it or upgrade.

First-run exercise

Run the security audit on your highest-stakes feature (auth, payments, admin). Treat it as a calibration: does the AI surface real, file-specific issues or generic ones? Three concrete findings means the workflow is working on your stack. Zero real findings plus five generic ones means your context paste was too thin — add more files.

Quality check

Did findings cite specific files and lines, or were they “you should consider”? Specific is signal; vague is noise.
Are blockers actually blockers? AI sometimes labels a nit as a blocker — push back: “Why is this a release blocker rather than a nit?”
Did fixes arrive as diffs you can paste, or as prose? Demand diffs.
After applying fixes, does the re-audit come back clean? If new issues appeared, validate them before treating them as real.
Did the accessibility audit honestly flag what it could not check from code? If it claimed full WCAG coverage, it is bluffing.

How to reuse this workflow

Save the three audit prompts in one doc, with your project name and stack baked in.
Keep a “previously caught” log per project. Patterns repeat across releases.
Run it before every external review (AdSense, App Store, SOC 2 prep). The pre-flight cuts surprises.
For teams, paste the findings plus fixes into the release notes so the next person inherits the context.
In CI, you can pipe changed files to the model on each PR and gate merges on “no new blockers” — Cursor’s BugBot does a version of this automatically. Be ready to override the model’s severity calls.

Common mistakes

Treating the AI audit as comprehensive. It catches the obvious mechanical issues; business-logic flaws and the manual ~60% of WCAG criteria need human eyes or a real pen test.
Not validating fixes. AI sometimes “fixes” by suppressing the symptom (catch-and-ignore) instead of addressing the root cause.
Pasting too little context and getting generic advice. The fix is more context, not a better prompt.
Running all three audits as one mega-prompt. Quality drops; do them separately and triage per audit.
Ignoring nits forever. They compound — one nit per release becomes 30 in a year.
Skipping the re-audit. The first pass’s findings can mask each other; the second pass surfaces the deeper layer.

FAQ

Does this replace a real security audit?: No. It is the pre-flight that makes the real audit faster and cheaper. SOC 2 and PCI need humans, and so do business-logic access checks.
Which model should I use?: For a whole-repo audit, use a tool that reads the full tree — Claude Code (Opus 4.7, 1M-token context) or Cursor. For a single feature, pasting files into ChatGPT Plus (GPT-5.5) or Claude is enough.
How do I get the AI to push harder?: Add “Be brutal — review this as a senior engineer who has caught this team being sloppy before.” It surfaces things a diplomatic prompt hides.
What about framework-specific issues like CSRF?: Name your stack: “This is Next.js with App Router; check for CSRF in Server Actions specifically.” Targeted prompts get targeted findings.
Can I run this in CI?: Yes, with caveats. Pipe changed files to the model on each PR and gate on “no new blockers.” Cursor’s BugBot automates a similar per-PR review. Always keep a human override on severity.
What does it cost?: Three audits on a small app run roughly $1–5 in API calls (Opus 4.7 is $5/$25 per 1M input/output tokens as of June 2026). Cheaper than fixing the same issue after launch.

For the authoritative source list, see the OWASP Top 10:2025 and the WCAG 2.2 standard.

Tags: #AI coding #Tutorial

TL;DR

What this covers

Who this is for

When to reach for it

Pick the right model for the audit

Before you start

Step by step

Why three separate audits

What the AI will and will not catch

Sample findings to expect

First-run exercise

Quality check

How to reuse this workflow

Common mistakes

FAQ

Related

Related Articles

AI Changelog Generation: From Commits to a Release Note Humans Read

AI-Assisted Database Migrations — Reversible, Backfilled, Tested

AI for Incident Postmortems Without Sanitizing the Lessons

AI Merge Conflict Resolution: When to Trust the Auto-Merge

AI On-Call Debugging: From Page to Fix Without Panic

AI PR Descriptions: From Diff to Reviewable