Codex Code Review Feels Too Shallow? Fix It

Q: Is there a built-in Codex review command, or do I have to prompt it?

There's a built-in path. In the CLI, run `/review` inside a session; headless, run `codex review --base main`; on GitHub, comment `@codex review`. All read whole files and report prioritized findings without modifying your working tree. Prompting is for *focusing* the built-in reviewer, not replacing it.

Q: How do I make Codex enforce file:line anchors and severity on every review?

Put them in `AGENTS.md` under `## Review guidelines` ("Anchor every finding to file:line", "Flag missing tests as P1"). Codex applies the closest `AGENTS.md` to each changed file, so the rule sticks across CLI, GitHub, and automatic reviews without re-typing it.

Q: Can I run reviews on a stronger model than my interactive session?

Yes. Set a review-specific model in `~/.codex/config.toml`:

Q: Codex flagged a real issue — can it fix it?

On GitHub, comment `@codex fix it` and it launches a task that applies the fix and updates the PR. Still spot-check the result before merging.

Codex review returns generic bullets like "consider error handling." Fix it with the built-in /review command, AGENTS.md review guidelines, and file:line-anchored prompts.

Published: May 17, 2026 Updated: Jun 15, 2026 Author: AI Productivity Guide Team 🌐 查看中文版本

You asked Codex to review a PR. The output: eight bullets that read like a code-review template — “consider error handling,” “add tests for edge cases,” “is this thread-safe?”, “watch for null inputs.” You can’t act on any of it, and you can’t tell whether Codex even opened the changed files.

Fastest fix: stop typing “review this PR.” Use the built-in /review command (in the Codex CLI) or comment @codex review on the PR, and put a ## Review guidelines section in your repo’s AGENTS.md so every review enforces file:line anchors and severity labels. Generic prompt gets generic review; a focused prompt anchored to file:line gets specific answers. The rest of this page covers which bucket you’re in and the exact prompts that produce reviews worth reading.

Use the built-in review path first (as of June 2026)

Most “shallow review” complaints come from people pasting a diff into a chat box. Codex has shipped a dedicated reviewer since early 2026 that reads whole files (not just the diff hunk) and reports prioritized findings without touching your working tree. Use it before you hand-roll a prompt:

In the Codex CLI, run /review inside a session. It examines your working tree, focuses on behavior changes and missing tests, and reports findings. Follow up with /diff to inspect the exact file changes it referenced.
Headless / pre-PR, run codex review --base main to review your branch against the merge base, or add a focus: codex review --base main "Review for race conditions and idempotency in the payments path".
On GitHub, comment @codex review on a pull request (or enable automatic review of every PR in the GitHub integration settings). To scope it: @codex review for security regressions, missing tests, and risky behavior changes. After it flags issues, @codex fix it launches a task that applies fixes and updates the PR.

If the built-in reviewer is still shallow, the cause is almost always the prompt or missing repo guidance — keep reading.

Common causes

Ordered by hit rate, highest first.

1. The prompt was “review this PR” with no criteria

Generic prompt produces generic review. Codex defaults to a summary template because nothing in the prompt tells it what to look for in this specific code.

How to spot it: Re-read the prompt. If it doesn’t name a dimension (security / perf / correctness / API contract) or a specific concern, you got a template.

2. No `AGENTS.md` review guidelines in the repo

Codex applies guidance from the closest AGENTS.md to each changed file. With no ## Review guidelines section, every review starts from zero — it doesn’t know your team flags missing tests as P1 or that the payments package needs extra scrutiny.

How to spot it: git ls-files | grep AGENTS.md returns nothing, or the file has no review section. The same generic bullets appear across unrelated PRs.

3. Codex commented only on the diff, not whole files

A 3-line change can be correct or catastrophic depending on the rest of the function. When the review only references lines visible in the diff hunk, it never read the surrounding context.

How to spot it: Review comments only reference lines that are in the diff, never the calling function or the file’s invariants.

4. Diff was too large; Codex sampled

A 1,500-line PR strains context. Codex reads the first few files in detail, then summarizes the rest. Comments are dense on early files, vague on later ones.

How to spot it: Comment density per file is uneven — the first 3 files get specific feedback, the last 7 get one generic comment each.

5. Codex fell back to “safe reviewer voice”

Without anchoring, Codex hedges: “consider”, “you might want to”, “watch out for”. These never claim a definite issue, so they’re un-actionable but un-rebuttable.

How to spot it: Count uses of “consider”, “might”, “could potentially”. A high count means hedging mode, not real review.

6. No threat model / domain context

Codex doesn’t know the PR is on a payments path, so it reviews like generic CRUD. Domain-specific concerns (idempotency, race conditions, fraud surface) never surface because nothing told Codex this is high-stakes code.

How to spot it: The review skips risks that are obvious to a domain expert.

7. The prompt asked for everything at once

“Review for correctness, security, performance, style, and accessibility” gets a thin pass at each. Codex tries to cover every dimension and goes deep on none.

How to spot it: The review touches many dimensions but goes deep on none.

Which bucket are you in?

Symptom in the review	Most likely cause	Jump to
No `file:line` anchors anywhere	Vague prompt (#1)	Step 1
Same generic bullets across every PR	No `AGENTS.md` guidelines (#2)	Step 2
Comments only on diff lines, ignore calling code	Diff-only read (#3)	Use `/review` (reads whole files)
First files specific, rest vague	Diff too large (#4)	Split the PR / Step 2
Lots of “consider” / “might”	Hedging voice (#5)	Step 3
Misses obvious domain risks	No context (#6)	Step 4
Touches everything, deep on nothing	One mega-prompt (#7)	Step 3

Shortest path to fix

Ordered by ROI. Step 1 alone produces most of the depth gain.

Step 1: Replace “review this PR” with focused questions

Use this template (paste it after /review custom instructions, or in your @codex review comment):

Review this PR for [specific concern].

For each issue:
- file:line (mandatory — no generic comments)
- One-sentence problem
- One-sentence proposed fix
- Severity (P0 ship-blocker / P1 before merge / P2 follow-up)

Do not comment on style, formatting, or anything ESLint already catches.
Do not write "consider X" — either flag X as a P-rated issue or omit it.

The “do not consider X” line is what removes the hedge-mode escape hatch.

Step 2: Put review guidelines in `AGENTS.md`

This makes every review (CLI, GitHub, automatic) start from your standards instead of a blank template. Add a top-level AGENTS.md with a ## Review guidelines section — Codex applies the guidance from the closest AGENTS.md to each changed file:

## Review guidelines

- Anchor every finding to file:line; drop anything you can't anchor.
- Flag missing tests for changed logic as P1.
- Flag changes to public function signatures or behavior as P1.
- In services/payments/**, treat non-idempotent writes and unguarded
  concurrent updates as P0.
- Do not raise style or formatting comments — ESLint owns those.

Drop a nested AGENTS.md (or AGENTS.override.md) inside a high-risk package — e.g. services/payments/AGENTS.md — and its rules take precedence for files in that subtree because the closer file wins.

Step 3: Ask one question per review

Don’t bundle dimensions. Run separate passes:

Pass 1: "Review for correctness. Are there cases where this code produces a wrong result for valid input?"

Pass 2: "Review for null/undefined handling. List every place the new code dereferences a value without a null check."

Pass 3: "Review for API contract changes. Does this PR change any public function signature or behavior?"

Each pass returns deeper, file-anchored feedback than one combined pass would.

Step 4: Provide context up front

Context: This PR is on a payments path. We process ~10K transactions/day.
We care about:
- Idempotency (same request twice must not double-charge)
- Race conditions (two parallel webhooks updating the same order)
- Audit trail (every state change must log who / when / from-state)

Review the diff against these concerns specifically. Ignore other dimensions.

Domain context elevates a generic review to a domain review.

Step 5: Demand adversarial framing

Try to break this code:
- What's the worst input that could reach `processPayment(req)`?
- What's the worst concurrent timing (race condition) that could occur?
- What state could violate the function's pre/post conditions?

For each, paste the input plus expected vs. actual behavior.

Adversarial questions push Codex out of “looks plausible” mode into “find a failure” mode.

Step 6: Spot-check, then reject un-anchored findings

Don’t trust the review wholesale. Pick 2 findings and verify them:

Review claimed: "billing.ts:42 has a race condition between read and update."
Open billing.ts:42, read the surrounding code, decide for yourself.

If the spot-check holds, the rest is more credible. If it fails (“there’s nothing on line 42”), the review is hallucinating — re-prompt. And if Codex returns “consider error handling” with no line reference, push back:

"Consider error handling" is not actionable. For each concern give me:
- Specific file:line
- The exact line(s) of code with the problem
- What goes wrong with what input

Re-do the review with these constraints. Drop any concern you can't anchor.

This trains the session — subsequent reviews stay specific.

How to confirm it’s fixed

A genuinely useful review should pass all four checks:

Every finding has a file:line you can click to. No anchor, no finding.
Every finding has a severity (P0/P1/P2) so you know what blocks merge.
The spot-check holds — open 2 cited lines and the issue is really there.
It mentions your domain risks (the ones in your AGENTS.md or context block), not generic CRUD advice.

If a re-run after Step 1–2 still produces zero anchors, the diff is too large (cause #4) — split the PR or review one file at a time with codex review --base main scoped to that path.

Prevention

Keep a ## Review guidelines section in AGENTS.md, with stricter nested rules for high-risk packages.
Maintain a “review prompts” directory: one prompt per area (security, perf, correctness).
Always provide domain context (payments / auth / internal tooling) at the start.
Require file:line + severity for every comment; reject the rest.
Run multiple narrow review passes instead of one wide pass.
Spot-check 2 comments per review to calibrate trust.
AI review complements, not replaces, human review on important PRs.

FAQ

Is there a built-in Codex review command, or do I have to prompt it? There’s a built-in path. In the CLI, run /review inside a session; headless, run codex review --base main; on GitHub, comment @codex review. All read whole files and report prioritized findings without modifying your working tree. Prompting is for focusing the built-in reviewer, not replacing it.

How do I make Codex enforce file:line anchors and severity on every review? Put them in AGENTS.md under ## Review guidelines (“Anchor every finding to file:line”, “Flag missing tests as P1”). Codex applies the closest AGENTS.md to each changed file, so the rule sticks across CLI, GitHub, and automatic reviews without re-typing it.

Why does Codex only comment on lines inside the diff? That happens when it reads the diff hunk but not the surrounding file. The /review command and codex review read whole changed files by design; if you’re pasting a raw diff into a chat box instead, switch to those.

My PR is 1,500 lines and the review goes vague halfway through. What do I do? That’s context pressure (cause #4). Split the PR, or review per path: codex review --base main "focus on src/billing/**". Smaller surface area gets uniformly specific feedback.

Can I run reviews on a stronger model than my interactive session? Yes. Set a review-specific model in ~/.codex/config.toml:

[model]
review_model = "gpt-5.5-codex"

That keeps interactive work on a faster model while reviews use heavier reasoning.

Codex flagged a real issue — can it fix it? On GitHub, comment @codex fix it and it launches a task that applies the fix and updates the PR. Still spot-check the result before merging.

External references:

Tags: #Codex #Coding agent #Troubleshooting #Debug #Shallow review

Use the built-in review path first (as of June 2026)

Common causes

1. The prompt was “review this PR” with no criteria

2. No AGENTS.md review guidelines in the repo

3. Codex commented only on the diff, not whole files

4. Diff was too large; Codex sampled

5. Codex fell back to “safe reviewer voice”

6. No threat model / domain context

7. The prompt asked for everything at once

Which bucket are you in?

Shortest path to fix

Step 1: Replace “review this PR” with focused questions

Step 2: Put review guidelines in AGENTS.md

Step 3: Ask one question per review

Step 4: Provide context up front

Step 5: Demand adversarial framing

Step 6: Spot-check, then reject un-anchored findings

How to confirm it’s fixed

Prevention

FAQ

Related

Related Articles

Codex Committed to the Wrong Branch (or Straight to main)

Codex Stalls on a Merge Conflict or Resolves It the Wrong Way

Codex Added a Package but the Lockfile Did Not Change

Codex Fix Passes Every Test but Breaks at Runtime

Codex Creates a Duplicate TypeScript Interface for One That Already Exists

Codex Rewrote Git History You Did Not Want Touched (amend / rebase / force-push)

2. No `AGENTS.md` review guidelines in the repo

Step 2: Put review guidelines in `AGENTS.md`