What this covers
Most PRs fail human review on the same three things: missing tests, broken conventions, and accidental side effects. Codex catches all three in roughly the time it takes you to go get coffee — but only if you brief it with your team’s actual conventions, not generic “best practices.” This guide is the brief-template, the review prompt that produces actionable findings (not vibes), and the human-handoff pattern that keeps Codex as a tireless pre-reviewer rather than a rubber stamp.
Who this is for
Engineering teams doing PR-based work, solo developers without a reviewer, and tech leads who keep getting PRs that “compile but feel wrong.” Especially useful in fast-moving codebases where conventions have drifted but nobody has time to write them down.
When to reach for it
Before any PR you want reviewed by a human. Run Codex first; the human reviews the remaining issues. Skip for tiny PRs (under 20 lines) where the overhead exceeds the value, or for production hotfixes where speed beats thoroughness.
Before you start
- Write or update an
AGENTS.mdorCONTRIBUTING.mdthat lists your team’s actual conventions: naming, error handling, testing rules, “never do this” patterns. Without this, Codex defaults to internet-average advice. - Make sure your repo’s CI is passing on the base branch. Codex inherits broken-baseline noise.
- Decide what counts as “blocking” vs “nice to have.” Codex will not weight findings; you do.
Step by step
- Open the PR or branch in Codex. Provide the review prompt with explicit categories:
Review this diff against the conventions in AGENTS.md.
Report findings in three sections:
1. Blocking — bugs, security issues, broken conventions, missing critical tests.
2. Important — performance regressions, missed edge cases, inconsistent style.
3. Nice-to-have — readability tweaks, naming, doc gaps.
For each finding, cite the file and line. Do not list anything you cannot point to.
- Let Codex review. Output should be a categorized list with citations. If it returns a wall of prose, the prompt was too vague — re-prompt with the categories.
- Triage the findings yourself. Disagree with anything that conflicts with your taste or context. Codex is wrong sometimes; the categories make disagreements explicit.
- For each accepted “blocking” item, fix it before the human review. For “important,” fix or annotate. For “nice-to-have,” batch into a follow-up.
- When new tests are needed, ask Codex to propose them: “For the change in
auth.ts, propose 3 unit tests covering invalid token, expired token, and missing claims.” Implement what is reasonable. - Submit to a human reviewer with a note: “Pre-reviewed by Codex; outstanding follow-ups noted in PR comments.” Saves the human time and signals that you read it too.
First-run exercise
- Take a PR that just got merged. Run Codex on it with the prompt above.
- Compare its findings to what the human reviewer actually said. The overlap tells you what Codex catches reliably.
- Note misses on both sides. Codex misses subtle business-logic bugs; humans miss tedious convention drift. The combo wins.
- For your next real PR, run Codex first and adjust how aggressively you trust each category.
Quality check
- Did every finding cite a specific file and line? “Concerns about error handling” without a citation is not a finding.
- Are blocking items actually blocking? Recategorize any that are taste or style.
- Did Codex miss anything obvious? If yes, the conventions file is missing that rule. Add it.
- Was the human reviewer’s time actually saved? If they still spent an hour, the prompt or the conventions file need work.
How to reuse this workflow
- Save the review prompt as a snippet. Same prompt for every PR.
- Maintain
AGENTS.mdlike documentation: any time a reviewer says “we do not do that here,” add the rule. - Track which finding categories you ignore most often — those are taste-not-convention and should be removed from the prompt.
- Re-run the workflow on a recently-merged PR each quarter to verify Codex still catches what you want.
Recommended workflow
Diff ready → Codex review with categorized prompt → triage findings → fix blocking, annotate important, batch nice-to-have → human PR submission with Codex notes attached.
Common mistakes
- Treating Codex review as the final word. It misses business-logic bugs that only a human with context catches.
- Skipping human review entirely. Codex is the pre-pass, not the decision-maker.
- No conventions file. Without it, Codex applies generic advice that may contradict your team’s choices.
- Letting Codex propose AND merge fixes without human review. Findings yes, autonomous merges no.
- Not categorizing findings. A flat list of 40 issues paralyzes the human reviewer.
- Ignoring “nice-to-have” items forever. Batch them quarterly into a cleanup PR or they become tech debt.
FAQ
- How long does a Codex review take?: 2-10 minutes for typical PRs, depending on diff size and test count.
- Can Codex review a draft PR?: Yes, and often should. Earlier feedback is cheaper than later feedback.
- Does it replace linting and type checks?: No. Lint and types catch syntax-level issues; Codex catches convention and logic issues. Run both.
- What about security review?: Codex catches common security patterns but is not a substitute for a dedicated security review on sensitive code paths.
Related
Tags: #AI coding #Tutorial #Codex