Pair programming with AI: a structured approach that beats free-form chat

Published 2026-04-02 by Owner

The “AI as pair programmer” framing is everywhere in the marketing for these tools. The actual experience of using them — open chat panel, type prompt, accept or reject suggestion — bears little resemblance to pair programming as practiced by people. Real pairing has structure: roles, handoffs, periodic check-ins. AI tools, used unstructured, give you something more like a fast-but-impatient junior who guesses what you want.

This is the structured approach that turns AI assistance into something that feels closer to genuine pairing. It’s slower per turn than free-form prompting and substantially more productive per hour.

What human pairing actually does

Two people pairing usually have one of these structures:

Driver/navigator: One person types (driver), the other watches and thinks ahead (navigator). They swap regularly. The navigator catches mistakes the driver doesn’t see, suggests directions, holds the bigger picture.

Ping-pong: One person writes a failing test, the other writes the code to pass it, then writes the next test. Roles alternate every cycle.

Strong-style: The navigator dictates intent (“I want to extract this into a helper”); the driver implements it. The driver is responsible only for typing, not deciding.

These structures all share something: explicit role separation, regular handoffs, and a shared mental model maintained through conversation.

The default AI workflow has none of these. You’re both driver and navigator, the AI is neither, and the “conversation” is one-shot prompts without shared context maintenance.

A working structure: the four-role rotation

The pattern I’ve landed on, after some experimentation:

Role 1: Architect. Decides what to do. The human, always.

Role 2: Implementer. Writes the code. Can be human, can be AI, can be both.

Role 3: Reviewer. Critiques the implementation. Should be the human, ideally a different mode of attention from the implementer.

Role 4: Tester. Writes verifying tests. Can be AI for established patterns, should be human for new patterns.

The key is treating these as distinct roles, switched between consciously, rather than collapsing them all into “I prompt and the AI does whatever.”

A typical session

A real session from last week, building an authentication middleware:

Architect phase (5 minutes, no AI)

I sat with a notebook and thought through:

What does the middleware need to verify? (JWT signature + expiry + user existence in DB)
What should it produce? (Either pass with user attached to context, or fail with a typed error)
What edge cases matter? (Expired token, malformed token, valid signature but user deleted)
What’s the integration point? (Express middleware, runs before route handlers)

This is design work. AI tools tend to produce reasonable design output for simple cases and unreliable output for cases that depend on your context. Doing this myself takes 5 minutes and avoids the failure mode where I implement something the AI proposed without thinking through whether it’s right for me.

Implementer phase (15 minutes, AI driven)

With the design clear, I opened Cursor Composer with a structured prompt:

Implement an Express middleware that:
- Reads JWT from the Authorization header
- Verifies signature using JWT_SECRET env var
- Checks token isn't expired
- Looks up the user by ID from the token in the users table
- Attaches the user to req.user if all checks pass
- Returns 401 with a typed error code if any check fails

File: src/middleware/auth.ts
Use the existing User type from src/types/user.ts.
Use the existing db helper from src/lib/db.ts.
Don't write tests yet — that's the next step.

Composer produced the middleware in about 90 seconds. I reviewed the output for adherence to the design. It matched. Accepted with one small naming adjustment.

This phase is where AI is genuinely useful: turning a clear design into code, faster than typing it all by hand.

Reviewer phase (5 minutes, human)

I switched modes: from “implementer accepting Composer output” to “reviewer reading the diff with skepticism.” Different attention.

What I look for in this mode:

Edge cases mishandled. Did the AI handle the “user deleted but token still valid” case the way I designed it?
Subtle wrong choices. Did it use jwt.verify() (correct) or jwt.decode() (insecure)?
Patterns that don’t match the codebase. Did it use our standard error format?
Things I’d have written differently. Stylistic alignment with the rest of the project.

This isn’t the same as the review I do when accepting a Composer diff in real-time. That review is “does this look reasonable?” The reviewer phase is “does this match what I designed?” — comparing the implementation against my design notes from the architect phase.

For the auth middleware, I caught one issue: the AI’s error handling returned generic “Unauthorized” for all failures, when I’d designed for specific error codes (AUTH_TOKEN_INVALID, AUTH_TOKEN_EXPIRED, AUTH_USER_NOT_FOUND). I asked Composer to fix this; it did, in 30 seconds.

Tester phase (10 minutes, mixed)

For the test cases, I wrote them by hand for the new patterns (the user-deleted-but-token-valid edge case) and used Cursor for the patterns we already had elsewhere (typical valid/invalid token cases).

This split matters because writing the new-pattern tests reinforces the design, and writing them by hand keeps me in the role of “designer of the test suite” rather than “consumer of AI-generated tests.”

What this saves

For a typical feature of this size — about 100 lines of code plus tests — the time breakdown:

Phase	Time	Notes
Architect	5 min	Mostly thinking, no typing
Implementer	15 min	AI-driven, fast
Reviewer	5 min	Catches one thing, usually
Tester	10 min	Mixed
Total	35 min

Without AI: my equivalent estimate is 60 minutes for the same feature. With AI but unstructured: 45 minutes, with more cleanup work in PR review.

The structured approach is faster than unstructured AI use because the architect and reviewer phases catch problems before they become PR comments. Free-form prompting feels faster per turn but has more downstream cost.

When this structure breaks

A few cases where I drop the structure:

Tiny changes. A 5-line bug fix doesn’t need four phases. The structure is overhead. Just write it.

Exploratory coding. When I’m not sure what I want yet — “play with this API to see what’s possible” — the architect phase is doing thinking I don’t have data for yet. Skip it, prototype, then bring structure back when you know what you’re building.

Pairing with a real human. If there’s a human partner, AI as fourth role is overkill. The two of you have your own pairing structure; let AI be a tool one of you reaches for, not a peer.

Pure refactoring. When the goal is “preserve behavior, change shape,” the architect phase is mostly “follow the existing patterns.” Skip to implementer with a clear constraint.

Why the structure helps

The hypothesis behind the structure: AI tools are unreliable when asked to do design work, reliable when asked to execute design work, and uniformly bad at noticing their own mistakes. The structure puts AI in the roles it’s good at and keeps humans in the roles AI fails at.

The architect phase prevents “I asked the AI to design and got something that compiles but doesn’t fit.” The reviewer phase prevents “I accepted the AI output without checking if it matched what I asked for.” The tester phase prevents “the tests test what the AI generated, not what the design specified.”

Each guard catches a different category of mistake. Together, they produce work that’s structurally sounder than free-form prompting at the cost of a few minutes of explicit role-switching per task.

What this isn’t

This isn’t a claim that structured AI use is “real pair programming.” It isn’t. Real pairing has dynamics — disagreement, surprise, the back-and-forth of two minds — that AI doesn’t reproduce. The structure described here is closer to “how to use AI to simulate the productivity benefits of pairing without the meeting overhead.” Different thing, useful in its own right.

For teams that genuinely benefit from human pairing, AI doesn’t replace it. For solo work where pairing isn’t an option, this structure is the closest practical approximation, and it’s better than free-form prompting by a measurable margin.