Fast mode in Claude Code: when to trade Opus 4.7 for Opus 4.6 speed

Published 2026-05-11 by Owner

There’s a specific kind of frustration that builds during an interactive Claude Code session: you type a short question, wait four seconds for a reply, type a follow-up, wait four more seconds. The model is right on most of its answers, but you’re doing twenty turns and the latency compounds. By turn fifteen you’ve lost the thread of what you were trying to figure out.

Fast mode exists for this situation. It’s not a hidden setting or an undocumented trick — it’s a /fast toggle built into Claude Code that switches from Opus 4.7 to Opus 4.6 for the current session. Understanding when to flip it, and when to flip back, is the difference between using it well and using it as a crutch.

The confusion most people have when they first hear about /fast is treating it as a tier downgrade — as if flipping the toggle means accepting worse work to save time. That’s the wrong mental model. The right one is closer to how you already choose between tools: a power drill for a screw, a screwdriver for a stripped one. Different tradeoffs, neither inferior in absolute terms.

What `/fast` actually does

Typing /fast in a Claude Code session toggles the model to Claude Opus 4.6. Typing it again toggles back to Opus 4.7 (the default). The command works in both directions during a live session; your conversation context carries across the toggle.

The key thing to understand: this is not switching to a smaller model family. Opus 4.6 is still Opus. It was trained on the same model family, occupies the same capability tier relative to other Claude model families, and handles most coding tasks without visible quality loss. The difference is latency and throughput — 4.6 produces output faster and responds more quickly on the first token. What changes is the generation speed and the specific reasoning depth, not the model family.

The implication: fast mode is not a fallback for when you want “good enough.” It’s an alternative tradeoff point. Sometimes the right tradeoff is getting an answer in one second rather than four, even if that answer occasionally needs a follow-up correction.

Tasks where the speedup is noticeable

The sessions where /fast makes a measurable difference share a pattern: many short back-and-forth turns where each turn builds on the last.

Exploration and orientation. You’re new to a codebase and asking questions: “What does this function return?”, “Which module handles auth?”, “Is there a test for this path?” Each answer is mostly lookup and explanation, not deep inference. At Opus 4.6 speed, you can ask ten questions in the time Opus 4.7 would take to answer four. The throughput advantage is real.

Iterative scaffolding. You ask for a rough draft of a component. It’s 80% right. You ask for a change. It’s 90% right. You ask for another change. Each turn is cheap to verify and cheap to correct. Latency adds up across these cycles in a way that breaks flow. Fast mode keeps the loop tight. The thing that kills iterative scaffolding is not bad output — it’s the gap between reading output and submitting the next correction.

Known-territory edits. Rename a variable, extract a function, add a parameter to a method signature — changes where you already know what the right output looks like and you’re mostly using the model as a faster way to type. Opus 4.6 handles mechanical transformations as well as 4.7 does.

“I’m exploring” workflows. When the task is not yet defined — you’re probing possibilities, asking “what would it look like if we did X?” — fast mode matches the pace of thinking. Slow responses during open-ended exploration are particularly costly because they interrupt the speculative mindset. Each reply in an exploration phase is a prompt for the next question, not a final answer.

Boilerplate and test generation. Writing test cases for a function where you know exactly what inputs and outputs should look like, or generating a batch of similar CRUD handlers from a pattern — these are highly mechanical. The shape of the output is determined; the model is filling it in. Opus 4.6 handles this as reliably as 4.7.

The common thread: when the cost of a wrong answer is low (you’d catch it in two seconds of reading the output) and correction is cheap (one more turn), the latency reduction from Opus 4.6 is worth more than the marginal intelligence gain from 4.7.

A session where fast mode genuinely helped: working through a new codebase with 80 files, understanding data flow before writing anything. Fifteen consecutive lookup questions — “where does this request terminate?”, “which handler is actually called?” — all traces, no inference. On 4.7 the session would have taken 40 minutes. On 4.6 it took 15. The answers were identical.

Tasks where 4.7’s reasoning earns its latency

Not every task has a cheap error-correction path. For some problems, a wrong answer costs more than the latency you saved.

Hard debugging. Tracking down a race condition, a type error that manifests only in certain environments, a subtle logic bug that five previous attempts missed — these require sustained coherent reasoning across multiple pieces of evidence. Opus 4.7’s advantage shows up here as fewer “that explanation was plausible but wrong” moments. One solid answer beats three fast wrong ones. In a long debugging session, a wrong hypothesis from 4.6 can cost you twenty minutes of chasing a dead end. The latency you saved on the first reply is a poor trade.

Architecture decisions. Choosing a data model, evaluating whether an abstraction is worth the complexity, deciding how to split a module that’s grown too large — these require holding a lot of context and reasoning about second-order consequences. Fast mode can produce an answer. That answer may be locally coherent but globally naive. This is exactly where 4.7 earns its keep.

Code review on unfamiliar code. Reviewing a PR in a part of the codebase you don’t know well, where you’re relying on the model to surface non-obvious issues — the quality difference between 4.6 and 4.7 becomes more apparent when neither of you has seen the code before. 4.7’s review catches more.

High-cost-of-wrong tasks. Infrastructure changes, security-relevant code, anything where a missed edge case will surface as a production incident. The time you save on generation is not worth the time you spend recovering from a mistake the faster model would have flagged.

The pattern: when the cost of being wrong is high, or when the problem requires multi-step reasoning where early errors compound, stay on 4.7.

There’s a subtler version of this worth flagging. Some tasks look cheap on the surface — “just explain what this function does” — but the explanation feeds a decision with real stakes. Fast mode for orientation, 4.7 when the interpretation directly drives a consequential call.

Toggling within a session

The session context carries across /fast toggles. Everything the model has seen in the current session — your conversation history, any @-referenced files — remains visible after you switch modes. The model changes; the context doesn’t reset.

This means you can start on 4.7, switch to 4.6 once you understand the problem space, and switch back when you hit a hard question. It’s a fluid control, not a session-start setting.

A few things to know:

The toggle is immediate. The next message after /fast uses the switched model.
There’s no persistent visual indicator of current mode in some Claude Code versions — if you’ve lost track, type /fast again. The model will confirm the mode changed, which tells you what it was before.
Model preference set with /fast applies to the current session only. New sessions start on the default (4.7).
File context added with @ before the toggle remains in the model’s context after the toggle. You do not need to re-attach files.
If the session is long and the context window is near capacity, switching models doesn’t extend it. Context limits are per-session regardless of model.

A two-phase workflow

The pattern I use for tasks that have both an exploration phase and a quality-sensitive phase:

Start in fast mode. Use Opus 4.6 to produce candidates, explore the space, generate a rough draft, or enumerate approaches. The goal in this phase is coverage and speed, not precision. You want multiple options in front of you, quickly.
Identify the one or two outputs worth taking seriously.
Switch to standard mode. Run the candidate through Opus 4.7 with a “review this and find the problems” prompt. Ask it to check edge cases, verify correctness, or improve the approach.

The two-phase structure works because the tasks are genuinely different. Generating candidates benefits from speed; reviewing candidates benefits from depth. Using the same model at the same latency for both is leaving something on the table.

This also has a useful side effect on prompt quality. Framing the first pass as “generate candidates” encourages a more expansive prompt — “give me three versions” rather than “give me the solution.” The 4.7 review pass then has real options to work with, not a single path to rubber-stamp.

Concretely, for writing a complex function:

[/fast mode]
"Write a first draft of a JWT validation middleware that handles token expiry, 
invalid signatures, and missing auth headers. Return 401 with a structured 
error body in each case."
→ Review the draft. It's mostly right.

[toggle off /fast]
"Review this middleware for edge cases I might have missed. Pay particular 
attention to timing attacks, the case where headers are present but malformed, 
and what happens if the JWT library throws unexpectedly."
→ 4.7 finds the malformed-header case wasn't handled. Fix it.

The first pass was fast and got to 80%. The second pass was slower and got to 95%. Neither pass alone is as efficient as the combination.

What fast mode won’t fix

Fast mode reduces latency. It doesn’t change the quality of the prompt, the specificity of the context, or the clarity of what the model is being asked to produce.

This matters because the most common frustration people attribute to model choice is actually a prompt problem. “The fast model gave me shallow output” is often “I gave an underspecified prompt and the model filled in the blanks.” Opus 4.7 also fills in blanks — it just happens to fill them in ways that more often align with what you intended, because its stronger priors cover more edge cases.

If fast mode outputs are consistently disappointing on a certain class of task, the fix might not be switching back to 4.7. It might be adding one line of context — “this is a production codebase with strict type requirements and no runtime validation fallbacks” — that anchors the model’s assumptions. A well-specified prompt to 4.6 often beats a vague prompt to 4.7.

Fast mode also won’t help if the bottleneck isn’t model latency. On a slow connection, both models are slow. If the speed problem is on the network side, switching models doesn’t help.

Similarly, if the slow part of your session is reading and understanding the model’s output — not waiting for it — fast mode doesn’t address the constraint. The bottleneck there is your attention, not the model’s generation speed.

The default you’ll probably land on

After a few weeks of deliberate /fast use, most people converge on a default: fast mode on until there’s a reason to switch. The reasoning is that most tasks in a coding session are not the hard ones — they’re lookup, orientation, scaffolding, and iteration. Running those on 4.7 is paying for depth you’re not using.

The exception class is what you train yourself to recognize: the moment the problem gets hard, ambiguous, security-relevant, or architectural. That’s the switch trigger. Not “this turn feels important” as a vague signal, but a concrete signal: I don’t know what the right answer looks like, so I can’t cheaply verify what the model produces.

That’s the signal to type /fast again.

Opus 4.6 is not a lesser model. It’s a faster model that sits at a different point on the speed/reasoning tradeoff curve. For the majority of routine coding tasks — scaffolding, exploration, mechanical edits — the speed advantage outweighs the capability difference. For the minority of tasks that require sustained reasoning or where errors are expensive, the reasoning depth of 4.7 outweighs the latency cost.

The skill is knowing which type of task is in front of you. Most experienced Claude Code users end up running fast mode during orientation and iteration, switching back for the hard parts. Starting a session by defaulting everything to 4.7 because “it’s better” misses the point; so does running everything on 4.6 because “it’s faster.”

The toggle exists so you don’t have to commit to one model for the full duration of a session. The session evolves — from orientation to implementation to debugging — and the right model for each phase isn’t the same. That’s the actual feature. The speed gain is a consequence of using it intentionally, not the goal in itself.