A pattern I see in a lot of teams’ AI workflows: ask the agent to do a substantial piece of work, let it run, review the resulting diff at the end. The intuition is that batch review is more efficient — fewer interruptions, the work done in parallel with other thinking.
In practice, batch review of AI output is consistently worse than incremental review. I’ve watched this play out in my own work and on a few other teams. The reasons aren’t obvious until you look at where the cost actually lives.
The case for batch review
The intuitive argument:
- Reviewing one big diff is more focused than 10 small ones
- The AI works in the background while you do other things
- Fewer context switches, fewer interruptions
- The model can produce a coherent multi-part change that doesn’t make sense in pieces
Each of these is plausible. None of them, in my experience, hold up.
What actually happens with batch review
I tracked my own batch-review sessions for a month. The pattern, repeated:
The AI runs. Maybe 15 minutes for a substantial feature.
I review the resulting diff. It’s 200-400 lines across 6-10 files.
I notice the first issue around line 50. Some pattern that doesn’t match our codebase. I make a note: “fix this.”
I notice the second issue around line 80. A different misunderstanding — the AI used the wrong helper function. Note: “fix this.”
I notice a third issue in the test file. The tests don’t actually test the new behavior; they test that the new code runs.
By line 200, I’ve collected 4-5 issues. I’m spending mental energy holding them all in my head while continuing to read.
I get to the end. Now I have 5 issues to send back to the AI as a fix request.
The AI fixes them. But the fix produces new issues — the AI’s fix for issue 2 disturbs the assumption that issue 4’s fix relied on. Now I have a new diff to review.
Repeat. Maybe 2-3 iterations to get to a clean state.
Total time: an hour or more. The “batch” approach has become an iterative approach with bigger batches and longer cycle times. The compounding cost of issues that interact with each other’s fixes is the part that batch review hides.
What incremental review looks like
The alternative pattern:
Ask the AI to do step 1. A 30-line change. Maybe 1 minute of execution.
Review immediately. 90 seconds to read 30 lines and decide it’s right.
Either accept or reject and re-prompt. If accepted, continue. If rejected, the cost of re-prompting is small because the context is fresh and the disagreement is local.
Step 2. Another 30 lines, another 90 seconds of review.
Step 3. Continue until done.
For the same feature, this might be 5-7 increments instead of one batch. Each increment is small enough to review accurately. Issues are caught when they’re easy to fix because they’re isolated.
The total time, for me on representative work: about 35-45 minutes vs. the hour-plus of the batch approach. And the resulting code is consistently cleaner, because the issues that would have compounded in batch never compound when caught early.
Why incremental wins
Three structural reasons:
Issue cost compounds when issues interact. A diff with 4 unrelated issues is 4 things to fix. A diff where issue 2’s fix changes the constraint that issue 4 depended on is more than 4 things to fix — the fixes interact, and you need extra rounds of review to catch the interactions. Incremental review keeps issues independent because each step is small enough that it has at most 1-2 issues.
Review accuracy degrades with diff size. I’m a worse reviewer of a 400-line diff than of four 100-line diffs reviewed separately. Attention runs out. Pattern recognition fades. By line 350, I’m skimming, not reviewing. Incremental keeps every diff in my “reviewing carefully” zone.
Re-prompt cost is lower with fresh context. When you re-prompt the AI mid-batch, you’re competing with the AI’s invested context — it’s already committed to a particular approach, and asking it to change is partial-rewrite territory. When you re-prompt at a step boundary, the slate is cleaner.
The exception: well-bounded mechanical tasks
There’s one category where batch is genuinely better: tasks where the AI’s output is mostly right by construction.
“Write the test file with parallel tests for all five public methods.” This is mechanical. The AI either does it right (90% of the time) or wrong in obvious ways (10% of the time). Reviewing the result in batch is fine because there’s nothing subtle to catch — either the structure is correct or it isn’t.
For mechanical work, batch saves time. For anything that involves judgment, semantic decisions, or interaction with existing code, batch is the worse pattern.
What this implies for tool choice
The tools that make incremental review natural:
- Aider’s per-edit auto-commits — every Aider response is one increment, atomic
- Cursor’s Cmd+K — small, targeted, reviewable inline
- Cline’s plan-mode + step approval — explicitly incremental
The tools that push toward batch review:
- Cursor’s Composer with multi-file diffs — can be large
- Background agents that run for tens of minutes — produce big batches
- Any “give me a full feature” workflow
Both modes are available in most tools. The question is which mode you default to. Defaulting to incremental, with batch reserved for the genuinely-mechanical cases, produces better results for me consistently.
What I tell people now
When teammates ask why their AI workflow feels frustrating, the first question I ask: “How big are the chunks you’re reviewing?” If the answer is “the whole feature when it’s done,” that’s usually the lever.
The shift from batch to incremental isn’t free. It feels slower per step. It feels more interrupting. Both feelings are real and both are worth ignoring, because the measured outcome — total time to a clean diff, total energy spent on review — favors incremental.
This isn’t a deep insight. It’s a pattern that’s well-known in code review for human-written code (small PRs are reviewed better than big ones) applied to AI-generated code with the same logic. The difference is that AI tooling makes batch review feel more available, even when it’s not actually better.
Keep your AI diffs small. Review them as they’re produced. Resist the temptation to let the agent run for 20 minutes and then sort through the result. The temptation feels productive and produces work you’ll spend more time fixing than you saved by parallelizing.