Spike mode vs production mode: two AI coding velocities for two purposes

Published 2026-05-11 by Owner

The first time I got genuinely fast with an AI coding tool, I made a mistake I hadn’t made since my early career: I shipped the exploration. The feature worked. The code was coherent enough to pass a quick read. Three weeks later, the codebase had absorbed a skeleton that was never designed to live there, and untangling it cost twice what the original spike had saved.

This is the classic spike problem, accelerated. AI tools make exploration fast enough that the spike is done before the mental gearshift into production-mode thinking can happen. Before AI pair-programming, a spike took long enough that there was a natural pause between “this worked in a scratch branch” and “let me write the real version.” That pause is gone. Auto-accepted suggestions and a fast model can produce 200 lines of working-but-throwaway code in under five minutes. The working part is seductive. The throwaway part is easy to forget.

The solution is to treat spike mode and production mode as explicit, separate operating modes—not as vibes on a dial.

Spike mode: auto-accept, low ceremony, throwaway by default

A spike has one job: answer the question “is this even possible?” It’s not meant to produce code. It’s meant to produce information.

Spike mode looks like this in practice:

Auto-accept all AI suggestions without reviewing each diff
No concern for naming, file organization, or test coverage
No inline documentation, no error handling beyond what’s needed to get a signal
Fast iteration loops: try something, see if it works, try the variant

# Spike session, Cursor or Cline in auto-accept
"Can we call the Stripe webhook API synchronously and get a confirmation in under 200ms?"
→ let it write whatever it needs to write
→ run it
→ get the answer: yes/no + rough latency
→ done

The output of a spike is not the code. It’s the answer to the question, plus any performance data or error patterns that surfaced during exploration. The code is a side effect.

This framing matters because it changes what “done” means. A spike is done when the question is answered, not when the code is clean.

Auto-accept makes spike mode substantially faster. Reviewing every diff the model produces is the right discipline in production, but in a spike it’s overhead. The model is producing throwaway code to test a hypothesis. Reviewing it is like proofreading a whiteboard sketch.

Production mode: deliberate, reviewed, tested

Production mode flips every spike assumption.

Review each suggestion before accepting, especially edits that touch shared logic
Naming and file placement matter from the first commit, not as cleanup at the end
Write tests either before or immediately after the code—not “later”
Error handling is explicit, not aspirational

# Production session
"Add a Stripe webhook endpoint that verifies the signature and queues the event."
→ read the generated handler before accepting
→ check the queue call matches the existing queue abstraction in the repo
→ write a test for the signature verification path specifically
→ accept and commit

The model is still doing most of the work, but judgment stays in the loop. Every diff is a decision. The discipline is in not skipping that step when the suggestions look plausible at first read.

Production mode is slower per line, but the velocity question is the wrong frame. The relevant metric is how much rework happens in the next two weeks. Spike mode with auto-accept generates rework. Production mode does not. Measured over a sprint, production mode is faster.

Switching modes requires an explicit signal

The failure mode is mode drift. You start a spike, get the answer, keep the session going because momentum is there, and before a clean break happens, you’ve accepted twenty more diffs that are building on the spike skeleton.

The signal that ends a spike and starts production work has to be external and deliberate. Options that work:

A git boundary. When the spike answers its question, commit a WIP: spike commit to a scratch branch, or just don’t commit at all. Then git stash or create a fresh branch for the production implementation. The branch is the mode boundary.

A comment header. At the top of the session (or the first message to the model), make the mode explicit:

// SPIKE: exploring whether streaming responses reduce perceived latency
// This code will not ship. Auto-accept everything, ignore code quality.

When the mode changes:

// PRODUCTION: implementing streaming endpoint based on spike findings
// Review each diff. Match existing patterns in src/api/. Tests required.

A break in the session. Closing the chat and opening a new one is a underrated reset. The model’s context resets; more importantly, the human’s context resets. The spike is over.

Without a signal, the brain treats the existing code as a sunk cost to preserve. The spike’s structure becomes load-bearing by accident.

The “I shipped my spike” mistake

Here is a concrete example of how this happens:

A team is prototyping a file parsing feature. The spike establishes that a particular parsing library handles the edge cases. The spike code is about 200 lines: a parser class with several methods, some hardcoded paths, a console.log replaced with a comment, and a test file with one happy-path case.

The spike worked. It answered the question. And then it shipped, because there was no clear signal that the spike was over and real implementation work had to begin.

Three months later that parser class is imported in eight places. The hardcoded paths are gone (someone cleaned those up), but the class has no interface contract—it was never designed to be depended on. The single test case hasn’t grown. Adding a new file format requires understanding the full class internals because there’s no abstraction layer to extend.

The debt was collected at 100% interest from the day it merged. Every new feature touching that parser paid the spike’s original shortcuts as a surcharge. The total cost of the shortcuts was not 20 lines of cleanup; it was weeks of slower development spread across months.

This is not a story about the library choice being wrong, or the developer being careless. The spike code was fine for a spike. The mistake was in the mode transition that didn’t happen.

The discipline: throw the spike away

When a spike produces useful results, the answer feels obvious. Throw the spike code away. Rewrite it from scratch in production mode, informed by what the spike taught.

This is harder than it sounds because the spike code is right there and it works. The path of least resistance is to clean it up—rename things, add tests, extract a function or two. The path of least resistance produces the “shipped my spike” outcome.

The rewrite takes about 20% of the time the spike took, and the result is meaningfully better: consistent with existing patterns, designed with interfaces rather than implementation details, tested at the right granularity. The spike compressed the exploration cost down to near zero. The rewrite is not expensive given that.

I use a simple rule: if the spike code would need more than ten minutes of cleanup to be production-ready, it gets deleted. Ten minutes is the threshold where cleanup stops being cleanup and starts being a stealth rewrite anyway—except done without the clean mental state of starting fresh.

When I delete the spike and rewrite, I also tell the model explicitly what mode we’re in and what constraints apply:

// Building the streaming endpoint for real.
// Spike showed: use TransformStream + flush intervals.
// Now: match the handler pattern in src/api/routes/, use the logger abstraction,
// add tests in tests/api/, proper error types.

The model produces substantially better output with this context than it did in the spike session. It’s working with the production context, not the spike context.

What this changes in practice

The two-mode framing changes a few concrete behaviors:

When starting any AI coding session, decide before the first prompt: spike or production? If it’s a spike, write that at the top of the prompt or as a comment header in the file. If it’s production, write that too. The act of naming it before code appears matters—it sets the mode before the momentum starts.

When a spike succeeds and the temptation to “just clean it up” appears, recognize it as the mode-transition failure it is. Open a new branch. Start fresh. The spike stays on its scratch branch or gets stashed. The production implementation starts from zero, informed by the spike’s findings.

When reviewing a colleague’s PR that smells like spike code—inconsistent abstractions, a test file that covers exactly one path, a class that wasn’t designed to be extended—ask whether there was a rewrite step between exploration and production. Often there wasn’t. “It worked in the spike” is the explanation. It’s also the problem.

A useful heuristic: if a PR includes a file that’s clearly a scaffold—not matching surrounding code style, with test coverage only on the obvious path—ask the reviewer to check when this code was first written. If it traces back to a spike session, the rewrite step was skipped.

AI tools amplify both modes equally. They make spikes faster and production work faster. They do not automatically enforce the boundary between them. That discipline is entirely human.