Does AI review replace human review?

No. Treat it as roughly 80% of a mechanical pass: AI catches null derefs, missing error handling, and obvious injection, while humans catch product intent and context drift. Pair both; don't substitute.

Can I run this on PRs from teammates?

Yes, but disclose it. Drop a note like "Ran an AI pass for bugs + security — flagged X and Y" so it doesn't read as stealth automation.

Why does AI keep flagging style even when I say "skip style"?

Constraint drift on long outputs. Add a closing line: "If a finding is about style or formatting, drop it from the output entirely."

How long should review prompts be?

The prompt itself: 60-150 words. The pasted diff: under ~1,500 lines per pass. Larger diffs go in chunks by directory, or hand the agent the git diff so it isn't truncated.

What if AI suggests a fix I disagree with?

Don't apply it. Ask AI to defend the suggestion in the same thread; if the defense is weak, you've just confirmed the original code was fine.

Should I ever let AI auto-apply review fixes?

Only on lint-class issues (naming, unused imports). Never on logic or security fixes — auto-apply hides bugs you never consciously accepted.

Prompt Library

Code Review Prompts: Beyond "Looks Good to Me"

13 copy-ready AI code review prompts that surface real bugs, security holes, perf issues, and test gaps — plus which 2026 review bots to pair them with.

Published: May 17, 2026 Updated: Jun 05, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

Generic “review this code” gives generic feedback. These 13 prompts target the specific failure modes a senior reviewer would catch, and each pins exactly one review lens so the signal does not get diluted.

TL;DR

One lens per prompt: bugs OR security OR perf, never all three at once. Mixing is the #1 reason AI review reads like noise.
Demand file:line on every finding plus a Critical/High/Med/Low severity. No line reference usually means the finding is hallucinated.
In Claude Code or Cursor, let the agent read the diff from git instead of pasting — fuller context, fewer truncations.
Pair these prompts with a review bot (GitHub Copilot code review, Claude Code Review, or Cursor Bugbot as of June 2026) for the mechanical pass, and keep humans for intent and product context.

Who this is for

Engineers doing self-review before a PR, tech leads who triage 10+ PRs a week, indie devs without a peer reviewer, and anyone onboarding to an unfamiliar codebase.

When not to use these prompts

Skip them for tiny one-line diffs or generated files (lockfiles, snapshot updates, autogenerated clients) — the overhead outweighs the signal. And do not run them on an AI-generated patch you have not read yet. Review your own draft first; otherwise you are asking one model to grade another while you stay blind to both.

Prompt anatomy: the six elements

Every code-review prompt below carries the same six parts. Drop one and quality falls off a cliff:

Element	What it does	Example
Reviewer role	Sets the lens and depth	”senior backend reviewer”, “SRE”, “staff frontend”
Scope filter	One review lens only	bugs OR security OR perf
Evidence requirement	Makes findings verifiable	every item needs `file:line`
Severity scale	Forces ranking, not a flat list	Critical / High / Med / Low
Confidence threshold	Cuts false positives	”only flag when confident”
Output shape	Keeps it scannable	numbered list or markdown table

13 copy-ready prompt templates

Placeholders in { } below are inside code blocks on purpose — paste the prompt as-is and swap the placeholder for your real diff or text.

1. Bug-focused review

Below is a diff. Review for likely bugs only. For each: file:line, severity (high/med/low), what's wrong, suggested fix. Ignore style. Be conservative — only flag if you're confident.

{paste diff}

2. Security-focused review

Review this code for security issues: injection, auth, secrets, input validation, unsafe deserialization, race conditions. For each: file:line, attacker scenario, fix.

{paste}

3. Performance review

Review for performance issues only: N+1 queries, sync calls in hot paths, redundant work, missing memoization, unbounded growth. Show before/after for the top 3 issues.

{paste}

4. Readability + naming

Review readability only: unclear names, overlong functions, deep nesting, mixed levels of abstraction. Suggest specific renames. Don't propose rewrites.

{paste}

5. Test-coverage review

Below is a change with its tests. Identify: cases not covered, brittle test patterns, tests that test implementation rather than behavior. Suggest 3 missing test cases.

{paste}

6. API design review

Review this public API surface for: parameter clarity, error handling consistency, breaking-change risk, idiomatic style for {language}. Suggest one alternative shape if applicable.

{paste}

7. Diff context check

For this diff, list any change where the author may not have read enough surrounding context (e.g., touched a function but didn't update callers). For each: what's likely missing.

{paste}

8. “PR description” sanity check

I'm writing a PR description. Here's my draft + the diff. Does the description accurately cover the change? List discrepancies.

Description: {paste}
Diff: {paste}

9. Migration / schema-change review

This diff includes a DB migration. Review for: missing backfill, irreversible ALTERs, lock duration on large tables, missing index for new FKs, downstream code still referencing the old shape. For each: file:line, risk, suggested mitigation. Assume the table has ~10M rows in production.

{paste}

Variables to swap: the migration file + any model/ORM changes. The “~10M rows” line forces realistic lock-duration thinking instead of a textbook answer.

10. Concurrency-aware review

Review this diff for concurrency issues only: shared mutable state, missing locks/mutexes, async race windows, awaits inside transactions, double-fires from React effects or handlers. For each: file:line, the interleaving that fails, suggested fix. Skip everything else.

{paste}

11. Reviewer-to-junior translator

I will paste a senior reviewer comment. Rewrite it for a junior engineer who just joined: explain the underlying principle in 2 sentences, then quote one before/after code snippet showing the fix. Keep the original sharpness but remove jargon.

Reviewer comment: {paste}
Original code: {paste}

12. “Should this be split?” review

Below is a PR diff with N files changed. Decide whether it should be split into smaller PRs. Output: (1) verdict (ship / split into 2 / split into 3+), (2) the lines I should cut the split along, (3) any change that should be its own commit even within one PR.

{paste}

13. Reviewer behavior audit (self-check)

Run this on your own past review comments to spot blind spots in how you review.

Below are 20 review comments I left this month. Cluster them: (1) what kind of issues I catch best, (2) what I systematically miss (security? perf? test design?), (3) tone patterns that may feel harsh, (4) 3 specific changes I should make to my review style.

{paste comments}

Pair the prompts with a 2026 review bot

The prompts above are for interactive review in a chat window or your IDE. For the always-on pass that runs on every PR, the three serious options as of June 2026:

Tool	How it runs	Model	Pricing (June 2026)	Notes
GitHub Copilot code review	Inline PR comments, agentic since Mar 2026	Copilot models	Included in paid Copilot plans; consumes Actions minutes + 13x premium-request multiplier since Jun 1, 2026	Low/Medium effort levels; can hand fixes to the coding agent
Claude Code Review	Inline PR comments via `claude-code-action`; managed service launched Mar 2026	Claude Opus 4.7 / Sonnet 4.6	Self-hosted Action on any plan (you pay API); managed service ~$15-25/PR, Team/Enterprise only	Won’t approve, block, or auto-fix in managed mode; full prompt control if self-hosted
Cursor Bugbot	Inline GitHub PR comments; Autofix left beta Feb 2026	Bugbot models	$40/user/mo add-on; 200 PR/mo cap on Pro, unlimited on Teams	Bugs-only by design; GitHub-only (no GitLab/Bitbucket)

A practical split: run a bot for the mechanical first pass on every PR, then use the prompts here for the high-stakes diffs (migrations, auth, concurrency) where you want to steer the lens yourself. For local CLI review, Claude Code’s /review reads the working diff directly so you skip the paste step entirely.

Common mistakes

“Review this code” with no lens — you get a shallow, everything-at-once dump.
Skipping AI’s “low confidence” items without checking. Those are exactly where the subtle bugs hide.
Letting AI propose full rewrites in review mode. Review finds problems; refactor fixes them. Keep the modes separate.
No severity scale, so every finding looks equally urgent and you fix the trivial ones first.
Findings with no file:line. You cannot verify them, and the whole review loses its evidence bar.

How to push results further

Pin one lens per prompt — bugs OR security OR perf. Mixing dilutes signal.
Demand file:line on every finding. If the model cannot produce it, the finding is probably hallucinated.
Add a severity scale (Critical / High / Med / Low) so AI ranks rather than enumerates.
In Claude Code or Cursor, let the agent read the diff from git rather than pasting — full context, fewer truncations.
Run a second pass with the opposite reviewer persona (“now critique your own findings — which are weakest?”) to filter false positives before you act on them.
For PRs over 500 lines, ask AI to list the riskiest files first, then review only those. This cuts noise more than any single prompt tweak.
Save the prompt + response in the PR thread so future reviewers see what AI already checked and don’t duplicate the work.

FAQ

Does AI review replace human review? No. Treat it as roughly 80% of a mechanical pass: AI catches null derefs, missing error handling, and obvious injection, while humans catch product intent and context drift. Pair both; don’t substitute.
Can I run this on PRs from teammates? Yes, but disclose it. Drop a note like “Ran an AI pass for bugs + security — flagged X and Y” so it doesn’t read as stealth automation.
Why does AI keep flagging style even when I say “skip style”? Constraint drift on long outputs. Add a closing line: “If a finding is about style or formatting, drop it from the output entirely.”
How long should review prompts be? The prompt itself: 60-150 words. The pasted diff: under ~1,500 lines per pass. Larger diffs go in chunks by directory, or hand the agent the git diff so it isn’t truncated.
What if AI suggests a fix I disagree with? Don’t apply it. Ask AI to defend the suggestion in the same thread; if the defense is weak, you’ve just confirmed the original code was fine.
Should I ever let AI auto-apply review fixes? Only on lint-class issues (naming, unused imports). Never on logic or security fixes — auto-apply hides bugs you never consciously accepted.

Tags: #Prompt #AI coding #Code review