Integrating AI into your PR review workflow without making review worse

Published 2026-03-25 by Owner

A PR review tool that adds AI-generated comments sounds like a productivity win. In practice, the first month after a team turns one on usually produces:

A flood of style nitpicks the team disagrees with
Generic “consider extracting this into a function” comments on every diff
Real issues mixed with noise at a ratio that makes the real issues hard to see
A team that develops habits of dismissing AI comments as a category, including the ones that mattered

Wired carefully, AI in PR review is genuinely useful. Wired wrong, it’s worse than not having it. This is the wiring that’s worked for the teams I’ve seen succeed.

The two failure modes

The teams that bounced off AI PR review usually fell into one of these:

The “summary spam” failure. The AI generates a summary of every PR, posted as a comment on PR open. The summary describes what the PR does, in prose. For a 50-line PR, this is 200 words of “you can read this from the diff.” For a 5-line PR, it’s longer than the change. The reviewer’s habit of reading the comment becomes “scroll past the AI block to find the human comments.”

The “drive-by reviewer” failure. The AI reads every diff and posts inline comments suggesting changes. Most are stylistic (“consider using const here”) or generic (“might want to add error handling”). Some are wrong (suggesting Pythonic patterns in JavaScript files). The actual reviewers become a second pass that ignores the AI’s comments because they’re mostly noise.

Both failures share a structure: the AI is asked to add value that’s marginal on average and frequently negative. The marginal value at scale produces net cost.

What AI PR review is good at

The categories where AI-assisted review consistently adds value:

Catching what a busy reviewer misses. A reviewer scanning a 400-line diff will skim the test files, miss the typo in the variable name, and not notice that the new code path doesn’t have a test. AI doesn’t get tired. It catches these.

Surfacing the diff TL;DR for context-poor reviewers. When the reviewer doesn’t have full context — they’re reviewing a PR for someone on a different sub-team — a structured “this PR changes X, Y, Z and the most likely concern is W” can help the human review better.

Pre-review self-check. Before opening the PR, the author runs an AI review on themselves and fixes the obvious issues. This is the highest-value use case and the one most teams miss.

Catching specific anti-patterns. Things like “this query doesn’t use the available index” or “this React effect has a missing dependency” — the AI is reliably good at these and humans frequently miss them.

The pattern: AI catches the systematic, doesn’t catch the contextual. Use it for the systematic, leave the contextual to humans.

A wiring that works

After watching several teams iterate, the setup that produces sustained value:

1. AI runs pre-review, not as a reviewer

The single biggest change is switching from “AI reviews after PR is opened” to “AI reviews before PR is opened, by request.” The author runs an AI check on their own changes, addresses the comments that matter, and opens a cleaner PR.

# A typical setup: a script the author runs before opening the PR
git diff main..HEAD | claude-review --rules .review-rules.md

The output is a list of suggestions. The author decides which to apply. The PR that gets opened is already past the AI’s pickier comments.

This shifts the AI from “additional reviewer commenting on the team’s PRs” to “tool the author uses.” It eliminates the spam-comment problem because the comments never reach the team channel.

2. AI on PR-open should focus on summary, not nitpicks

If you do want AI commentary on opened PRs, scope it tightly:

A short summary if (and only if) the PR description is missing
Specific high-value checks: “no test for this new path,” “this query may not use the index”
No style comments — handle those in the linter
No generic suggestions like “consider X”

Most PR review tools that ship with AI default to “comment on everything.” This is wrong by default for most teams. Configure aggressively.

3. Keep the reviewer in the human loop

A specific anti-pattern: AI comments tagged in a way that makes them easy to mistake for human comments. This devalues human comments by association.

Make AI comments visually distinct (a 🤖 prefix, a different label, a specific bot account) and let reviewers filter or hide them when they want to focus on human review.

4. Calibrate to your team’s standards

The AI’s comments should be tuned to your team’s standards, not generic best practices. This means:

A .review-rules.md (or equivalent) that captures what your team cares about
Configuring the AI to skip categories your team has decided to handle elsewhere
Periodic review of the AI’s comment patterns — when the team is dismissing 80% of a category, that category should be turned off

This is configuration work. It pays for itself within a couple of weeks of higher signal-to-noise.

What to put in the rules file

A useful .review-rules.md is short. The categories that matter most:

# What this review tool should check

## High-value checks

- New code paths without tests
- Untyped error returns (we use typed errors throughout)
- Database queries that may not use the available indexes
- React effects with stale closure issues
- Missing or incorrect cleanup in async operations

## Skip these

- Style nitpicks (handled by prettier and eslint)
- Generic "consider extracting this" suggestions
- "Add a comment here" suggestions
- Suggestions to use a different library
- Documentation suggestions on non-public-API code

## Specific patterns to flag

- New environment variable used without being added to .env.example
- New API endpoint without rate limiting
- New event being emitted without a typed schema
- Migration without a corresponding rollback

About 30 lines, every line specific. The “Skip these” section is what saves the team’s attention.

The tools available

A non-exhaustive map of options as of mid-2026:

Copilot Chat in PR review. GitHub’s built-in AI for PR review. Better at “summarize this PR” than at “find issues.” Default-on for many teams; defaults are noisy.

Cursor’s cursor review command. Local CLI that reviews a diff. Good for the pre-PR self-check use case. Can be configured to read project-specific rules.

Claude as a custom GitHub Action. Several teams I know wire Claude API calls to PR-open events with custom prompts. More flexibility, more setup.

Cody from Sourcegraph. Strong codebase awareness, useful for “does this change break callers in other repos.”

Codium / Qodo. Specialized PR review tools with their own opinions about what to surface.

The choice between these matters less than the discipline of how you configure them. The same tool can be net-positive or net-negative depending on the rules and team adoption.

The team-level metric to watch

The signal that AI PR review is working: time-to-review goes down without quality going down.

The signal it’s not working: reviewers start posting “approving without reading the bot comments” or you notice issues landing in production that the AI flagged but everyone ignored.

If you adopt AI PR review, set up a quarterly review of:

How often is the AI flagging something? (Frequency)
How often is it right? (Precision)
How often does it catch something a human would have missed? (Value)
How often does the team act on its comments? (Adoption)

If all four are healthy, the tool is working. If any is degrading, recalibrate before the team gives up on the whole category.

What I actually do

For my own work, the workflow:

Before opening a PR, I run a local AI review on the diff. It flags 2-5 things on a typical PR; I act on 1-2.
The PR gets opened with AI-on-PR commentary disabled. Human reviewers see human-readable PRs.
The reviewer can opt-in to running an AI review for a second opinion on their review, but it’s not posted as comments — they read it themselves.

This produces PRs that are cleaner on first open and reviews that stay human-led. The AI is in the loop without being in the channel.

For teams larger than 10 engineers, this discipline is harder to maintain manually — at scale, you’ll want some automation around the pre-review step. Make sure that automation produces output the author sees first, not the team.

The whole game is keeping AI’s noise out of human attention while keeping its signal in the workflow. That’s the wiring that works.