Several teams I’ve worked with have hit the same wall in the past 18 months. Engineers got faster at producing code thanks to AI assistance. Throughput at the individual level went up 30-100%. Throughput at the team level went up 10-20%, then plateaued. The bottleneck moved from “engineers writing code” to “engineers reviewing each other’s code.”
This isn’t a hypothetical concern. It’s the observed pattern when teams adopt AI tools without thinking about how the team scales.
The math, roughly
Pre-AI, a senior engineer might produce 200-300 lines of working code per day in a stable codebase. A typical PR is 200-500 lines. Each PR gets ~30-60 minutes of reviewer time across 1-2 reviewers.
With AI, the same engineer might produce 400-600 lines per day. PRs are now bigger or more frequent. Reviewer time per PR is the same — humans review at human speed.
The team’s review capacity is fixed. If 4 engineers each ship 50% more code, the review queue grows. If reviewers also ship more code (which they do), they have less time to review.
The result: PRs sit in review longer, reviewer fatigue grows, and review quality drops as reviewers rubber-stamp PRs to clear the queue.
Why AI doesn’t help review at the same rate
AI is faster at writing code than reviewing it. The speed asymmetry is the core of the problem.
A reviewer doing their job carefully needs to:
- Understand what the PR is trying to do
- Check that the PR does it correctly
- Check that the PR doesn’t break anything else
- Check that the PR fits the codebase’s patterns
- Check that the PR is testable and tested
- Check that the PR is maintainable by the next person
AI tools (Cursor’s BugBot, Copilot’s PR review, GitHub Actions with AI plugins) help with the second and third items mostly. They miss the others. Real reviewers still have to do the rest, and the AI assistance shaves maybe 20-30% of their time, not 50%.
So the review time per PR drops modestly while the PR rate increases substantially. The queue grows.
What gets sacrificed
Teams under review pressure cope by sacrificing review quality. The patterns I see:
Reviewers stop reading carefully. They look at the diff for obvious issues, glance at the description, approve. The careful “understand what this is doing” step gets skipped. Bugs ship.
Senior engineers become review bottlenecks. PRs get routed to whoever’s available. When juniors review junior code, the deep issues (architectural problems, subtle bugs) get missed. Senior reviewers become the only path for serious work, and they’re overloaded.
Standards drift downward. When reviewers don’t have time to push back, “merge it and we’ll fix it later” becomes the default. The “later” doesn’t happen. Codebase quality declines incrementally.
PR descriptions get sloppy. When the author knows the reviewer is rubber-stamping, the description quality drops. Context-free PRs make future review even harder.
These compound. A team that lets review quality slip in 2024 has a worse codebase in 2026, which makes future PRs harder to review, which puts more pressure on review, which… etc.
What I’ve seen work
Teams I’ve seen handle this well are doing some combination of:
Smaller PRs. When a PR is 200 lines, it’s reviewable. When it’s 1500 lines, it’s reviewed superficially. Cap PR size by policy. AI makes large PRs easy to write; the team needs a counter-pressure.
More automated checks. Anything an automated check can catch (formatting, type errors, lint rules, simple security patterns) shouldn’t go to a human reviewer. Beef up the CI pipeline. Add checks for things you previously caught in review. The reviewer’s time should be on what only humans can catch.
Pair-style review for important changes. For architecturally important PRs, two engineers sit together, walk through the code, discuss. This is slower per PR but produces better review and shares context. The compromise: not every PR, just the ones that matter most.
Pre-review by AI. Use BugBot, Copilot Review, or similar to do a first pass before human review. The AI catches some issues; the human reviewer focuses on the rest. The AI’s value is in offloading the low-effort catches, not in replacing review.
PR templates that surface intent. A good template forces the author to articulate what the PR is doing and why. Reviewers spend less time decoding intent and more on substance. AI can write code; it doesn’t articulate intent the way a human-written PR description does.
What hasn’t worked
A few approaches I’ve seen tried and abandoned:
“Just trust the AI.” Some teams concluded that AI-generated code is good enough that it doesn’t need review. Six months later, those teams had codebases full of plausible-looking-but-wrong code. They reverted to requiring review.
Heavier review process. Adding more required reviewers, more required checks, more steps. Slows down everything without addressing the throughput mismatch. Reviewers still review at the same speed; you’ve just added more queues.
Hire more reviewers. This works at a cost. Senior engineers don’t grow on trees. The team that needs more reviewers is usually the one whose engineers are good but stretched. Hiring takes 6+ months; the problem is now.
Letting AI auto-merge. A few teams have experimented with auto-merging PRs that pass certain AI-driven gates. The results have been mixed; the gate quality matters enormously, and most teams’ gates aren’t tight enough.
A specific pattern that’s been working
The pattern I’ve seen in two organizations recently:
Default: AI generates a PR, a senior engineer reviews and stages it, the senior is responsible for any issues.
The author isn’t off the hook — they wrote the code, even if AI helped. The senior reviewer is responsible because their approval moves the work forward. This creates an incentive for the senior to actually review carefully (their reputation is on the line) rather than rubber-stamp.
It works because it acknowledges the scaling problem. Pre-AI, senior reviewers were a check on junior code. Now, they’re a check on AI code, which is similarly liable to plausible mistakes. The seniors are doing the cognitive work AI can’t do, and the team’s structure recognizes that.
The flip side: it concentrates review work on senior engineers. They can’t ship as much code as they could before because review is now their primary job. Some don’t like this; the ones who do, see the team’s overall throughput climb because the bottleneck is being managed rather than denied.
What teams should be measuring
Most teams measure individual engineer throughput. AI made that go up. The wrong conclusion is “AI is working great.”
Better measurements:
PR cycle time. From PR open to merge. If this is going up, your bottleneck is review.
Review depth. Number of comments per PR, number of revision rounds, time from open to first review. If these are going down, you may be sacrificing quality for throughput.
Bugs reported per merge. If your AI-assisted output has more bugs than your pre-AI output, the throughput gain is illusory.
Reviewer load. Hours per reviewer per week spent reviewing. If this is climbing, you’re heading toward burnout.
These are not new metrics. Most teams I see don’t track them carefully. The ones that do see the problem before it becomes a crisis.
The bigger pattern
The pattern is generalizable: any tool that accelerates production faster than review creates a quality crisis when adopted naively. AI is the current example. Outsourced contracting was a previous example. So was offshore development. So was, in a different sense, hyper-aggressive feature development without QA.
The lesson is that throughput isn’t free. Capacity comes from a system, and tools that affect part of the system create imbalances unless the rest of the system adapts.
For AI specifically, the adaptation is mostly about review. The teams that figure this out before they have a quality crisis are going to be in much better shape than the teams that figure it out after.