Bug audit ≠ code review. Audits hunt specific bug families: race conditions, nil/null, off-by-one, leak. These prompts target each family.
Who this is for
On-call engineers preparing for a release, founders shipping unattended code, security-adjacent teams who can’t afford a regression, anyone debugging an incident root cause.
When not to use these prompts
Don’t run these on toy scripts or one-off automation that runs once a month — too much overhead. Also avoid mixing audit with refactor in the same prompt; two goals, two passes.
Prompt anatomy / structure formula
A bug-audit prompt should always carry six elements:
- Bug family: pick ONE — race / null / off-by-one / leak / timezone — never “find all bugs”.
- Scope: which files / functions / commits — keeps the search bounded.
- Failure scenario: tell AI to describe the interleaving or input that triggers it, not just label it.
- Evidence: every finding needs
file:lineand a concrete repro path or test idea. - Severity: Critical / High / Med — forces ranking and prevents flat enumerations.
- Action format: numbered list or table with
file | line | scenario | fix sketch.
Best for
- Pre-release audit
- Inherited codebase debugging
- Incident root cause hunt
- Refactor safety net
- Pre-launch regression sweep
13 copy-ready prompt templates
1. Race condition hunt
Audit below for race conditions, shared-state mutation, missing locks. For each finding: file:line, scenario where two goroutines / threads collide, suggested mitigation.
{paste}
2. Null / undefined hunt
Audit for likely null/undefined dereference. List call sites where the input could plausibly be null/undefined and isn’t checked.
{paste}
3. Off-by-one hunt
Hunt off-by-one errors: loop bounds, slicing, pagination, date arithmetic. For each: file:line, scenario, fix.
{paste}
4. Error handling audit
Audit error handling: swallowed errors, generic catch-all, missing context. List each suspicious site + suggested logging / propagation.
{paste}
5. Resource-leak hunt
Audit for resource leaks: open files, DB conns, event listeners, subscriptions, timers. Flag every open-without-close pattern.
{paste}
6. Timezone bug hunt
Audit for timezone bugs: implicit local time, naive datetime, conversions during DST. List each + how it could fail.
{paste}
7. State machine inconsistency
Below is a state-machine-like flow. List impossible states, unreachable transitions, missing guards. Suggest one cleaner state model.
{paste}
8. Boundary input hunt
For each function below, list boundary inputs (empty, single, max, negative, special chars, unicode, very large) where behavior is unclear. Suggest tests for each.
{paste}
9. Float / money math hunt
Audit this code for floating-point / money-arithmetic bugs: `0.1 + 0.2` accumulation drift, currency rounding at the wrong layer, mixing cents and dollars, division-before-multiplication losing precision, tax / discount applied in inconsistent order. For each: file:line, the input that produces wrong totals, suggested fix (Decimal type, BigInt cents, etc.).
{paste}
Optimization: If the code handles invoices/orders, add: “Also flag any place where rounding happens twice in the same calculation chain.”
10. Idempotency / retry hunt
Audit for retry-safety bugs: external API calls without idempotency keys, DB writes that double-fire on retry, webhook handlers that are not idempotent, message consumers without dedupe. For each: file:line, what double-fires, suggested key/window/dedupe strategy.
{paste}
11. Cache-coherence hunt
Audit for cache bugs: writes that update DB but not cache, cache keys that miss tenant/user scoping, stale reads after writes, TTLs longer than the data's natural change rate, cache stampede risk. For each: file:line, stale-read scenario, fix sketch.
{paste}
12. Unicode / encoding hunt
Audit for string/encoding bugs: byte-length vs character-length confusion, `toLowerCase()` on non-ASCII, slugs that drop emoji or CJK, surrogate-pair truncation, normalization (NFC vs NFD) mismatches across DB/UI, header/URL decoding inconsistencies. For each: file:line, an input that breaks, fix.
{paste}
13. Audit → minimal failing test
Run last. Turns each audit finding into a runnable repro.
Take each finding from the bug audit above and write the minimal failing test that reproduces it in {framework}. Each test: one assertion, deterministic input, no mocks unless strictly needed. Mark which tests fail today vs which need infra (DB / queue / timezone faking).
Findings: {paste}
Variables to swap: framework (e.g., vitest, jest, pytest, go test)
Common mistakes
- Mixing categories — “find all bugs”
- No file:line in findings
- Trusting AI confidence without spot-check
- Asking for the fix in the same prompt as the audit — diagnosis blurs
- Not converting findings into tests — audit becomes a read-only doc
How to push results further
- Run one family per pass (race / null / leak / timezone). Cross-pollination dilutes findings.
- For each finding demand a trigger scenario, not just a label — “race here” is hallucination-prone, “if request A finishes after B but before C” is verifiable.
- Pair the audit with template 13 (minimal failing test) — only the tests prove the bug is real.
- On large codebases let Claude Code
Grepfor danger patterns (e.g.,catch (e) {},setTimeout,Date() and audit only matches. - Add a confidence threshold: “Only report findings you would bet $50 on.” Cuts noise by ~40%.
- After fix, re-run the SAME prompt — if findings reappear with new file:lines, you have a systemic issue, not a one-off.
- Keep an “ignore list” in your prompt for known false-positives so each pass doesn’t re-surface them.
FAQ
- How is bug audit different from code review?: Review covers a diff. Audit covers a whole module / repo and targets one bug family deeply. They’re complementary, not interchangeable.
- How often should I run a bug audit?: Before major releases, after inheriting a codebase, post-incident on the failing module, and quarterly for production-critical paths.
- Why does AI miss obvious bugs sometimes?: Most often because the model can’t see the call site or the type definition. Either expand context or use Claude Code/Cursor so the agent can
Readmore. - Should I trust the severity AI assigns?: Treat severity as a starting point. Re-rank manually using business impact — AI doesn’t know which path carries revenue.
- What about false positives?: Expect 20-30%. The cost of a missed bug usually exceeds the cost of confirming false positives — accept the asymmetry.
- Can AI fix the bugs it finds?: Yes, but do it in a separate pass. Mixing diagnosis and remediation makes both worse.
Related
- Code review prompts
- Security audit prompts
- Test generation prompts
- Refactor prompts
- How to Use AI to Write a Bug Report: Vague Complaints into Reproducible Tickets
Tags: #Prompt #AI coding #Bug audit