GitHub Copilot for PR review: a workflow that actually catches things
Published 2026-04-25 by Owner
GitHub rolled out Copilot’s automatic PR review in late 2025. The promise: every PR gets an AI review summary before a human looks at it. The reality: 70% of the comments I’ve seen on it are either restating what the diff already says or flagging non-issues.
That’s not useful enough to leave on by default for a serious team. But there’s a slimmed-down version of the workflow that does add value. Here’s what I’ve ended up running.
The default doesn’t work
Out of the box, Copilot’s PR review acts as a generalized code reviewer. It reads the diff, comments on each file, suggests improvements. The problem is that it has no context for your codebase’s conventions, no awareness of past decisions, and no judgment about what matters.
The typical comment:
Consider using
constinstead ofletfor variables that aren’t reassigned.
True. Also: caught by ESLint’s prefer-const rule, which we’ve had on for two years. If our lint passes, this comment is noise.
The dangerous comment:
The error handling here doesn’t account for the case where
userService.findById()returns null.
Sounds plausible. Often false — findById in our codebase throws on not-found, doesn’t return null. The model doesn’t know this. Reviewers who trust the comment add a defensive null check that’s dead code, and over time the codebase fills with dead defensive code.
Pass 1: lint and type-check first
Configure your branch protection so PRs can’t request review until CI is green. Copilot’s review runs after this. The point: Copilot should not be telling you about issues that automated tools already catch.
# .github/branch-protection.yml or via UI
required_status_checks:
- typecheck
- lint
- test
After this gate, the failures Copilot can flag are higher-value because they’re not about formatting, unused variables, or missing types.
Pass 2: scoped Copilot review with custom instructions
Use Copilot’s PR instructions feature (in repository settings → Copilot → custom instructions). Add:
When reviewing PRs in this repository:
1. Do not comment on style issues if our linter could catch them
2. Focus on: API contract changes, error-path correctness, security implications, race conditions, performance regressions in hot paths
3. Skip: variable naming opinions, suggestions to add comments, "consider extracting this into a function" suggestions
4. If the PR description says "no behavior change" and you see a behavior change, flag it as your highest priority
5. Reference specific lines in your comments; vague comments without line references will be ignored
6. Do not produce a summary at the top of the PR; only comment on specific issues
This narrows what Copilot actually surfaces. After this configuration on a project I work on, the comment volume dropped from ~12 per PR to ~3, and the signal-to-noise ratio went from roughly 1:5 to 2:1.
Pass 3: human review with Copilot as a tool
The human reviewer treats Copilot’s comments as one input, not the review. The pattern:
- Read the PR description first
- Look at the diff in your IDE (not GitHub’s web UI)
- Run the changed code against your mental model
- Then check Copilot’s comments for things you missed
Step 4 catches roughly one real bug per ten PRs, in my experience. Most of the time Copilot is wrong or noisy, but occasionally it spots a missing branch in error handling that I’d missed. That’s the value — it’s a second pair of eyes that’s wrong often enough that you can’t trust it, but right sometimes enough that ignoring it leaves bugs in.
What to disable
A few Copilot review features I turn off:
Auto-summary at the top of the PR. It’s almost always a restatement of the diff. The PR author wrote a description; that’s the summary.
Automatic test suggestions. When Copilot suggests adding tests, the suggestions are usually too generic — “test the happy path, test the error path.” If your team needs reminding to test, the problem isn’t lack of suggestions.
Documentation comment suggestions. “Consider adding a JSDoc comment to this function.” If we wanted it documented, we would have. Comment opinions don’t survive the long term in any codebase I’ve worked on.
What to enable
Suspicious change detection. Copilot can flag changes that look unintended — a config value changed in a way that doesn’t match the PR description, a comment removed without explanation, a test deleted. These are the highest-value catches.
Cross-file consistency. When a change in one file should logically have a corresponding change in another (e.g., adding a field to a type but not updating the migration), Copilot flags this reasonably well. Your linter doesn’t do this.
Security pattern detection. Hardcoded credentials, SQL string interpolation, missing input validation on API routes. These have low false positive rates and high impact.
A measurement after three months
I’ve been running this workflow on a team of 6 engineers for three months. Tracked outcomes:
- PRs with at least one Copilot comment that led to a real fix: 18%
- PRs where Copilot’s comments were entirely noise: 47%
- PRs where Copilot’s comments were noise but the human reviewer found a real bug: 31%
- PRs that got merged with zero issues: rest
That 18% number is the value. It’s not enough to replace human review. It’s enough to justify the tool — about one in five PRs gets a useful catch from it, on top of human review.
The trap is treating that 18% as 100% and skipping human review. That’s where teams using Copilot review badly end up shipping more bugs, not fewer.