Should I trust green CI?

Treat it as necessary, not sufficient. CI catches the regressions you already wrote tests for; AI failure modes usually live outside that coverage.

How long should review take?

Roughly 1 minute per 30 lines of meaningful diff. Anything substantially bigger should be split into reviewable chunks.

What if the AI wrote the tests too?

Read the test diff with doubled suspicion. Confirm at least one assertion checks a value the agent could not have inferred from the prompt.

Should I ask the AI to review its own diff?

Useful as a second opinion, never as the only one. It frequently misses the exact thing it just got wrong.

Does GitHub Copilot code review cost me anything now?

As of June 1, 2026 it's billed as AI Credits, and on private repositories each review draws from your plan's GitHub Actions minutes; public-repo reviews remain free. See [GitHub's pricing page](https://github.com/features/copilot/plans).

What about giant generated files (lockfiles, schemas)?

Collapse them, but spot-check that the version numbers actually changed match what the agent claimed it bumped.

AI Tool Tutorials

How to Review AI Diffs Efficiently (2026 Workflow)

A 200-line AI diff that compiles isn't safe. Here's the exact reading order, git commands, and tooling a senior engineer uses to catch silent renames, lost branches, and fake test fixes.

Published: May 17, 2026 Updated: Jun 04, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

TL;DR

A 200-line AI diff that compiles, passes CI, and reads cleanly is the highest-risk artifact in a modern codebase, because every surface signal says “merge me.” Review AI patches in a fixed order: deletions first, renames second, new logic last, then run the test the agent claims to fix from main and from the branch. Budget about 1 minute per 30 lines of meaningful diff. Tool-assisted review (GitHub Copilot code review, Cursor’s diff panel, asking Claude Code to critique its own work) is a useful second pass, never the only pass.

Why AI diffs need a different reading order

When a human writes a 200-line patch, the bugs cluster around the lines they typed by hand. When an agent writes it, the bugs cluster around the lines it deleted or moved without understanding — and those are exactly the lines your eye skips. An agent will happily drop an if (err) return guard because “the happy path was cleaner,” revert time-zone handling it never saw a test for, or move an auth check one layer down where it looks fine but no longer covers the public route.

The fix is to invert your instinct. Most reviewers read top to bottom and get hooked on the new code’s intent. For AI patches you read the least fun parts first, while you still have attention to spend.

This guide is for anyone reviewing AI-generated PRs — a teammate’s, an agent’s, or your own from an hour ago. Self-review is where most embarrassing AI changes ship, because the human assumes they still remember what they asked for.

Before you start

Pull the branch locally. Don’t review on the GitHub web UI alone — you need to run git log, git blame, and the test suite to disprove anything.
Demand a real description. “I asked Claude to fix the bug” is not a description. Push back. A good AI PR states the failing behavior, the root cause, and the fix.
Have the repro in front of you. If there’s no failing test or reproduction the patch claims to fix, the first question is what the agent was actually solving.
Turn off diff-folding in your editor or GitHub view. Collapsed regions are where subtle deletions hide.

The reading order, step by step

Read deletions first. Find the file with the most red. Deletions are how agents quietly drop branches, error handling, or whole feature flags they didn’t understand.
Read renames second. git log --follow and git diff -M surface real renames. Agents sometimes “rename” by deleting and recreating a file, which loses blame history and can bury a behavior change inside a commit labeled “rename.”
Read new logic last. New code is the most enjoyable to read and the easiest to nod through. Save it for when you’re warmed up and skeptical.
Run the claimed-fixed test from both sides. Check out main, confirm the test fails. Check out the branch, confirm it passes. Agents sometimes “fix” a test by rewriting the assertion to match the broken output.
git log <branch> --not main to see every commit. A commit titled “fix typo” that touches business logic is a panic-fix worth a second look.

A 60-second sanity script

Drop this in your shell to skim an AI patch fast:

git fetch origin
git diff --stat origin/main...HEAD | sort -k3 -n -r | head    # biggest files first
git log origin/main..HEAD --oneline                            # every commit on the branch
git diff origin/main...HEAD -- '*.test.*' '*spec*'             # test changes only
git diff origin/main...HEAD | grep -E '^-' | head -40          # deletions first

If that last line shows a deleted try, catch, if (err), or return, stop and read the surrounding context before anything else.

For diffs an agent dressed up as a “reformat,” force the real change to the surface:

git diff -w origin/main...HEAD              # ignore whitespace; what actually changed?
git diff --word-diff origin/main...HEAD     # word-level diff inside a line
git diff --ws-error-highlight=all origin/main...HEAD   # flag sneaky whitespace edits

git diff -w is the single fastest way to tell a genuine logic change from a 300-line “I reindented everything” commit. See the official git-diff options for the full list.

What AI gets wrong that humans usually don’t

Silent removal of if (err) return blocks because “the happy path was cleaner.”
Off-by-one shipped as a clarity edit: < quietly becomes <=.
Time-zone handling reverted to the system default because the agent never saw the TZ test.
Auth checks moved one layer down where they look correct but no longer guard the public route.
Network calls in tests mocked out where they originally hit real (intended) endpoints.
“Refactored” config where ordering is load-bearing — Express middleware, Webpack loaders, Vite plugins.

Using AI review tools as a second pass (as of June 2026)

Let a tool do a first sweep, then do the human pass above. Treat tool output as a checklist of places to look, not a verdict.

Tool	Where it runs	What it’s good at	Cost note (June 2026)
GitHub Copilot code review	On the PR, in GitHub	Bugs, security smells, style; Low tier is fast, Medium routes to a higher-reasoning model for complex/cross-service changes	Billed as AI Credits; on private repos each review consumes Actions minutes from your plan as of June 1, 2026 (public repos stay free)
Cursor	In the editor, before you commit	Side-by-side diff panel for multi-file agent edits; you approve each change inline before it lands	Pro $20/mo (~$16 annual); runs Sonnet 4.6, Opus 4.7, GPT-5.5, Gemini 3.1 Pro
Claude Code	In the terminal/IDE	Ask it to critique its own diff and explain each deletion; strong at finding the guard it just removed	Bundled with Claude Pro $20/mo ($17 annual)

Two rules for AI reviewers:

A second opinion, never the only opinion. The model that wrote the bug is the model least likely to flag it. GitHub’s agentic review will even hand its suggestions back to a coding agent to open a fix PR automatically — handy, but you still own the merge.
Read the test diff with extra suspicion when the AI wrote the tests too. Confirm at least one test asserts a value the agent could not have inferred from your prompt. A test that only checks “it returns something” proves nothing.

If you’re standing up agentic review on a new repo, start with the editor-level pass in Cursor and the terminal pass in your Claude Code workflow before you trust anything to auto-merge.

The full workflow

deletions → renames → new code → run the claimed-fixed test from main and branch
  → read git log for surprise commits → tool sweep → approve or send back

A 200-line patch should take 5–10 minutes. If it takes longer, the PR is too big — ask for it to be split. And for AI PRs, send it back rather than approving with comments: the agent will fix every nit in one round for free, so there’s no reason to merge with known issues.

FAQ

Should I trust green CI? Treat it as necessary, not sufficient. CI catches the regressions you already wrote tests for; AI failure modes usually live outside that coverage.
How long should review take? Roughly 1 minute per 30 lines of meaningful diff. Anything substantially bigger should be split into reviewable chunks.
What if the AI wrote the tests too? Read the test diff with doubled suspicion. Confirm at least one assertion checks a value the agent could not have inferred from the prompt.
Should I ask the AI to review its own diff? Useful as a second opinion, never as the only one. It frequently misses the exact thing it just got wrong.
Does GitHub Copilot code review cost me anything now? As of June 1, 2026 it’s billed as AI Credits, and on private repositories each review draws from your plan’s GitHub Actions minutes; public-repo reviews remain free. See GitHub’s pricing page.
What about giant generated files (lockfiles, schemas)? Collapse them, but spot-check that the version numbers actually changed match what the agent claimed it bumped.

Tags: #AI coding #Tutorial

TL;DR

Why AI diffs need a different reading order

Before you start

The reading order, step by step

A 60-second sanity script

What AI gets wrong that humans usually don’t

Using AI review tools as a second pass (as of June 2026)

The full workflow

FAQ

Related

Related Articles

AI Changelog Generation: From Commits to a Release Note Humans Read

AI-Assisted Database Migrations — Reversible, Backfilled, Tested

AI for Incident Postmortems Without Sanitizing the Lessons

AI Merge Conflict Resolution: When to Trust the Auto-Merge

AI On-Call Debugging: From Page to Fix Without Panic

AI PR Descriptions: From Diff to Reviewable