Bug 审计和 code review 有什么区别？

Review 看一段 diff，问改动安不安全；审计看整个模块或仓库，深扎一个 bug 家族。两者互补，不可替换——都要做。

多久跑一次 bug 审计？

大版本发布前、刚接手代码库时、事故后审出问题的那个模块、生产关键路径每季度一次。

为什么 AI 偶尔漏掉明显 bug？

通常因为它看不到调用点或类型定义。要么扩充你贴进去的上下文，要么把审计放进 Claude Code 或 Cursor 里跑，让 agent 自己 `Read` 周边文件。

可以相信模型给的 severity 吗？

当成起点，再按业务影响重排。模型不知道哪条路径承载收入或 PII。

会有实打实的误报——2026 年关于 LLM 静态 bug 检测的独立研究证实比例不低。漏掉一个线上真 bug 的代价通常大于复核误报，所以接受这种不对称，并用测试去确认。

AI 能顺手把它找到的 bug 修了吗？

能，但要另开一次。诊断和修复混在一起会让两边都变差。比如 Cursor 的 BugBot 就把两步拆开：先在 PR 上标出问题，再派一个 Cloud Agent 给出你来复审的修复。

AI 提示词库

Bug 审计 Prompt：上线前抓暗 bug

13 个可直接复制的代码审计 Prompt，分别针对竞态、null 解引用、off-by-one、内存泄漏、金额计算等 bug 家族。针对 Claude Opus 4.7、Cursor、Codex 调校（2026 年 6 月）。

发布于: 2026/05/17 更新于: 2026/06/04 作者: AI Productivity Guide Team 🌐 查看英文版本

Bug 审计不是 code review。Review 看的是一段 diff，问的是”这个改动行不行”；审计看的是整个模块，问的是”这里藏着哪一类 bug”。两者需要的 Prompt 不一样。下面每个 Prompt 只盯一个 bug 家族——竞态、null 解引用、off-by-one、泄漏、金额计算——因为让模型”找所有 bug”只会得到一串浅显且容易幻觉的清单，而让它”找出所有两个请求会同时改这个共享 map 的地方”，得到的才是你真能复核的东西。

一句话总结

每次只跑一个 bug 家族，绝不让一个 Prompt 包揽所有问题。
每条 finding 必须带 file:line、具体触发场景、严重度。没有场景就不算真 bug。
先诊断、后修复——用两个独立的 Prompt。两者混在一起会同时拖垮诊断和修复。
把审计和 #13 号 Prompt 配合，把每条 finding 转成最小失败测试。只有测试才是证据。
当下（2026 年 6 月）最合适的模型：Claude Opus 4.7（SWE-bench Verified 87.6%）负责推理，跑在 Claude Code 或 Cursor 里，让 agent 自己 Grep 和 Read 周边代码。

适合哪些人

即将发布版本的 on-call 工程师、独自上线没人复审代码的创业者、不能出回归的安全相关团队，以及正在追线上事故根因的人。

什么时候不建议这样写 Prompt

一次性脚本、一个月才跑一次的自动化别用——收益盖不过成本。也绝不要把审计和重构塞进同一个 Prompt。两个目标就分两次跑，混在一起只会得到一个含糊的 diff 加一份没做完的 bug 清单。

一个好的 Bug 审计 Prompt 长什么样

下面每个 Prompt 都带这六个要素，缺一个质量就断崖式下跌。

要素	作用	缺失后果
Bug 家族	每次只挑一种（竞态 / null / off-by-one / 泄漏 / 时区）	“找所有 bug” 只得到一堆浅清单
范围	哪些文件、函数或 commit	模型在整个仓库乱逛，漏掉热点路径
触发场景	触发问题的确切输入或交错	”这里有竞态”只是标签，不可复核
证据	`file:line` + 复现路径或测试思路	找不到位置的 finding 无法确认
严重度	Critical / High / Med	平铺罗列、没有排序
输出格式	编号清单或 `file \| line \| 场景 \| 修复` 表格	一大段散文，还得自己重新解析

这套 Prompt 适合用在哪

发布前审计
继承代码库调试
事故根因排查
重构前的安全网
上线前回归扫描

哪个工具跑得最好（2026 年 6 月）

单个文件直接贴进聊天框就行，但真正的审计需要模型看到调用点和类型定义，这就得靠一个能自己读仓库的 agent。

工具	模型	对审计的帮助	价格（2026 年 6 月）
Claude Code	Opus 4.7 / Sonnet 4.6	agent 用 `Grep` 找危险模式、`Read` 调用点；包含在 Claude Pro 里	Pro $20/月
Cursor	Opus 4.7、GPT-5.5、Gemini 3.1 Pro	在 IDE 内；BugBot 插件自动审 GitHub PR 并给修复	Pro $20/月；BugBot $40/席位
Codex	GPT-5.5	终端自主能力强（Terminal-Bench 2.0 拿 82.7%）	含在 ChatGPT Plus $20/月
纯聊天	Opus 4.7 / GPT-5.5	适合一个自包含文件，没有仓库上下文	Plus / Pro 档

单论 bug 推理质量，Opus 4.7 以 SWE-bench Verified 87.6% 领先 Gemini 3.1 Pro 的 80.6%（截至 2026 年 6 月）。用它做审计推理，再用任意一个能读你仓库的工具去跑体力活。

13 个可直接复制的 Prompt 模板

把 [paste] 换成你的代码，把 [framework] 换成你的测试框架。结构别动——指定的 bug 家族和对 file:line 的硬性要求，才是这些 Prompt 生效的关键。

1. 竞态条件

Audit the code below for race conditions: shared-state mutation, missing
locks, check-then-act gaps, unsynchronized map/slice writes. For each
finding give: file:line, the exact interleaving where two goroutines or
threads collide, severity, and a suggested mitigation.

[paste]

2. null / undefined

Audit for likely null/undefined dereference. List call sites where the
input could plausibly be null/undefined and isn't checked. For each:
file:line, the upstream path that yields null, severity.

[paste]

3. off-by-one

Hunt off-by-one errors: loop bounds, array slicing, pagination offsets,
date arithmetic, inclusive-vs-exclusive ranges. For each: file:line, the
input size where it breaks, fix.

[paste]

4. 错误处理审计

Audit error handling: swallowed errors, generic catch-all blocks, errors
logged but not propagated, missing context on rethrow. List each
suspicious site plus suggested logging or propagation.

[paste]

5. 资源泄漏

Audit for resource leaks: open files, DB connections, event listeners,
subscriptions, timers, goroutines. Flag every open-without-close pattern
and every early return that skips cleanup. For each: file:line, the path
that leaks.

[paste]

6. 时区 bug

Audit for timezone bugs: implicit local time, naive datetime, conversions
during DST transitions, storing local instead of UTC, day-boundary math.
List each plus how it fails and on which dates.

[paste]

7. 状态机不一致

Below is a state-machine-like flow. List impossible states, unreachable
transitions, and missing guards. Then suggest one cleaner state model with
explicit allowed transitions.

[paste]

8. 边界输入

For each function below, list boundary inputs (empty, single element, max,
negative, zero, special chars, unicode, very large) where behavior is
unclear. Suggest one test per case.

[paste]

9. 浮点 / 金额计算

Audit this code for floating-point and money-arithmetic bugs: 0.1 + 0.2
accumulation drift, currency rounding at the wrong layer, mixing cents and
dollars, division-before-multiplication losing precision, tax or discount
applied in an inconsistent order. For each: file:line, the input that
produces a wrong total, suggested fix (Decimal type, integer cents, etc.).

[paste]

优化建议： 处理发票 / 订单代码时，追一句：Also flag any place where rounding happens twice in the same calculation chain.

10. 幂等 / 重试

Audit for retry-safety bugs: external API calls without idempotency keys,
DB writes that double-fire on retry, webhook handlers that aren't
idempotent, message consumers without dedupe. For each: file:line, what
double-fires, suggested key/window/dedupe strategy.

[paste]

11. 缓存一致性

Audit for cache bugs: writes that update the DB but not the cache, cache
keys missing tenant/user scoping, stale reads after writes, TTLs longer
than the data's natural change rate, cache-stampede risk. For each:
file:line, the stale-read scenario, fix sketch.

[paste]

12. Unicode / 编码

Audit for string and encoding bugs: byte-length vs character-length
confusion, lowercasing non-ASCII, slugs that drop emoji or CJK,
surrogate-pair truncation, NFC-vs-NFD normalization mismatches across
DB and UI, header or URL decoding inconsistencies. For each: file:line,
an input that breaks, fix.

[paste]

13. 审计发现 → 最小失败测试

最后跑这一条。它把每条审计 finding 转成可运行的复现——这一步才真正证明 bug 是真的。

Take each finding from the bug audit above and write the minimal failing
test that reproduces it in [framework]. Each test: one assertion,
deterministic input, no mocks unless strictly needed. Mark which tests
fail today vs which need infra (DB / queue / timezone faking).

Findings: [paste]

可替换变量： [framework]（vitest、jest、pytest、go test 等）。

容易踩的坑

把多个类别混进一个”找所有 bug”的 Prompt。
finding 不带 file:line。
不复核就信模型的置信度。
在审计的同一个 Prompt 里就要修复——诊断被搅乱。
从不把 finding 转成测试，于是审计沦为没人动手的只读文档。

怎么把效果再往上推

每次只跑一个家族。 交叉会稀释 finding。
要触发场景，而不是标签。 “这里有竞态”容易幻觉；“A 在 B 之后、C 之前完成时”才能复核。
配合 #13 号 Prompt。 只有测试能证明 bug 真实存在。2026 年多个关于 LLM bug 检测的行业研究都指出误报率并不低，所以在失败测试确认之前，把每条 finding 都当成假设。
先用 Grep 预筛。 大仓库里让 Claude Code 或 Cursor Grep 找危险模式——catch (e) {}、setTimeout、Date(、对金额用 ==——只审命中的部分。
加置信度门槛： Only report findings you would bet $50 on. 噪音明显下降。
修完用同一个 Prompt 再跑一遍。 如果 finding 在新的 file:line 又冒出来，说明是系统性问题，不是个例。
维护一份 ignore list，把已知误报写进 Prompt，每次扫描就不会反复刷出来。

FAQ

Bug 审计和 code review 有什么区别？: Review 看一段 diff，问改动安不安全；审计看整个模块或仓库，深扎一个 bug 家族。两者互补，不可替换——都要做。
多久跑一次 bug 审计？: 大版本发布前、刚接手代码库时、事故后审出问题的那个模块、生产关键路径每季度一次。
为什么 AI 偶尔漏掉明显 bug？: 通常因为它看不到调用点或类型定义。要么扩充你贴进去的上下文，要么把审计放进 Claude Code 或 Cursor 里跑，让 agent 自己 Read 周边文件。
可以相信模型给的 severity 吗？: 当成起点，再按业务影响重排。模型不知道哪条路径承载收入或 PII。
误报怎么办？: 会有实打实的误报——2026 年关于 LLM 静态 bug 检测的独立研究证实比例不低。漏掉一个线上真 bug 的代价通常大于复核误报，所以接受这种不对称，并用测试去确认。
AI 能顺手把它找到的 bug 修了吗？: 能，但要另开一次。诊断和修复混在一起会让两边都变差。比如 Cursor 的 BugBot 就把两步拆开：先在 PR 上标出问题，再派一个 Cloud Agent 给出你来复审的修复。