Choosing between Opus, Sonnet, and Haiku in Claude Code without overspending
Published 2026-05-11 by Owner
The first Claude Code bill surprises most people. Not because any single task was expensive, but because the volume added up faster than expected — and most of that cost came from using Opus for tasks that Sonnet or Haiku would have handled just as well.
The three-tier model lineup is meant to be used as a routing system. Anthropic priced them intentionally: Opus is roughly 5x Sonnet in input cost, and Sonnet is roughly 4x Haiku. Output tokens typically run around 5x the input rate, which skews the per-task cost further because code generation is output-heavy. These ratios shift as Anthropic updates pricing, but the order-of-magnitude relationships stay stable enough to reason about.
The problem is that Claude Code defaults to Opus in the configuration most people reach first, and most people don’t change it. That’s expensive.
Why per-task cost compounds faster than expected
The cost model that trips people up: Claude Code uses a conversational agentic loop. Each tool call — reading a file, running a command, checking a diff — appends to the context window. By turn 10, the model is re-reading everything from turns 1 through 9, plus the new input.
A task with 20 tool turns doesn’t cost 20x a single prompt. It costs roughly sum(1..20) = 210 context-token-loads compared to a single 1-turn task. With a long running context, the token count for turn 20 includes all prior turns’ output, not just turn 20’s input.
Concretely: if turn 1 costs $0.05 on Opus, and each subsequent turn adds 2k tokens to the cumulative context at Opus input rates, turn 20 costs closer to $0.80. The total session cost for 20 turns isn’t $1.00 — it’s somewhere in the $5–10 range for a well-formed complex task. The same session on Sonnet runs $1–2. On Haiku, under $0.50.
This is the non-obvious part for newcomers. Each individual turn looks cheap. The compounding is what bites.
The model tier choice is therefore a multiplier on a growing base, not a flat per-task rate. Paying Opus prices early in a long session means every subsequent turn is carrying that overhead forward. If the task could have used Sonnet from turn 5 onward, switching models at turn 5 actually saves money on turns 5 through 20 — not just on turns 5 through 20 individually, but on the accumulated context re-reads across all of them.
Three task categories and which model fits each
Not every task demands the same reasoning depth. After routing tasks deliberately for a few months, the categories below track well.
Tasks where Opus is mandatory
Opus earns its price on tasks where reasoning under ambiguity is the core job. Specifically:
Architecture decisions with incomplete information. “We’re adding multi-tenancy to a single-tenant Rails app — where does the boundary go?” Sonnet will give you an answer. Opus will give you the tradeoffs, the failure modes, and flag which assumption is doing the most work. The difference is meaningful enough to pay for.
Multi-file debugging with non-local causes. When a bug spans three abstractions and the root cause isn’t obvious from the symptom, Opus sustains the reasoning chain across more context. Sonnet loses the thread on interactions between distant components more often than Opus does, in practice.
“What would break if…” analysis. Impact analysis for significant changes, where you need the model to track second-order effects. Sonnet handles the obvious dependencies; Opus handles the subtle ones.
Refactors where the spec itself is unclear. If the task is “clean up this module” with no explicit target state, Opus makes better judgment calls about what “clean” means in this codebase’s context. Sonnet will do something; Opus is more likely to do the right something.
The common thread: ambiguous inputs where reasoning quality changes the outcome. If Opus is going to take 5x as long to say roughly what Sonnet would say, use Sonnet.
Tasks where Sonnet matches Opus
For well-specified work, Sonnet is the right choice the majority of the time.
Refactors with explicit targets. “Rename user_id to account_id across these 12 files” is a mechanical transformation. The model doesn’t need to reason about it; it needs to execute it reliably. Sonnet does this.
Test writing from explicit specs. If the spec says “this function should return X for inputs A, B, C and throw for D,” writing the test suite is largely mechanical. Sonnet handles it without degradation.
Code reviews against a clear checklist. Security checks, performance antipatterns, a specific style guide — bounded evaluation tasks where the criteria are given. Sonnet applies them consistently.
Implementing a feature from a detailed spec. The more complete the spec, the more Sonnet can substitute for Opus. A 500-word spec that answers all the ambiguities shifts the task from “requires judgment” to “requires execution.”
The pattern: specified inputs, mechanical or bounded output. Sonnet’s strength is reliable execution. When the thinking is already done, it keeps up with Opus at a fifth of the cost.
A useful test before reaching for Opus: write out what you’d say to a smart contractor who knows nothing about your project. If you can write it out completely — the goal, the constraints, the relevant files — then the task is specified enough for Sonnet. If you can’t write it out without making judgment calls you’d need to discuss with them, that’s Opus territory. The discipline of writing the spec is also the filter for model selection.
Tasks where Haiku is enough
Haiku gets underused because people assume it’s only for toy tasks. That’s not accurate.
Single-file edits with clear instructions. “Add a null check before this assignment.” “Extract this block into a function.” “Add a log statement here.” These are small, targeted, unambiguous changes that don’t require understanding the broader system.
Format conversions. JSON to YAML. SQL schema to TypeScript types. OpenAPI spec to client stub. The transformation is mechanical and the output is verifiable.
Simple completions and boilerplate. A standard CRUD endpoint when you’ve shown it the existing pattern. A new test file following an existing fixture. Fill-in-the-blank tasks where the template is already in context.
Summarization within the current file. “Write a docstring for this function.” The function is visible; summarizing it doesn’t require multi-file reasoning.
Haiku’s ceiling is real — it degrades faster on complex multi-step tasks. But inside its ceiling, it’s fast and cheap enough that being wrong once costs less than using a more expensive model that was right first try.
One underrated use: Haiku as a verification pass. After Sonnet generates a larger change, asking Haiku to scan for obvious errors — missing imports, undefined variables, obvious off-by-ones — is cheap because it’s a bounded review with a clear checklist. It won’t catch subtle logic errors; that’s still Sonnet’s job. But it catches the mechanical mistakes quickly and cheaply before you run tests.
A routing heuristic to apply mid-session
The decision doesn’t have to be made once per project. Claude Code lets you switch models mid-conversation. The trigger conditions:
Is the task ambiguous about WHAT to do (not HOW)?
→ Opus
Is the task clear on what to do but spans multiple files with
non-obvious interactions?
→ Opus or Sonnet (lean Opus if interactions are subtle)
Is the task a specified transformation, bounded review,
or explicit implementation?
→ Sonnet
Is the task a single-file, single-purpose, unambiguous edit?
→ Haiku first; escalate if Haiku output needs fixes
Will this task involve more than ~15 tool turns?
→ Consider whether Opus is earning its compounding cost
A rule of thumb for session-level routing: start with Sonnet as the default. Upgrade to Opus when you hit a decision point with genuine ambiguity. Drop to Haiku for mechanical follow-ups once the hard thinking is done.
In practice, a well-structured session often looks like: Opus for the first 2–3 turns to frame the architecture, Sonnet to implement it across 10–20 turns, Haiku for the tail of single-file cleanup.
Configuration
Claude Code 1.x stores the model selection per-project via the --model flag or the model picker in the UI. There’s no per-turn routing config — you switch it yourself. That’s the current limitation.
# Start a session with Sonnet as the default
claude --model claude-sonnet-4-5
# Or set it in project config to avoid forgetting
# .claude/settings.json
{
"model": "claude-sonnet-4-5"
}
Setting a project default in .claude/settings.json prevents the “I forgot I was on Opus” problem. Upgrading from Sonnet to Opus for a specific task takes one UI click; it’s easier to upgrade deliberately than to remember to downgrade.
One other thing worth watching: context window growth. If a session has been running long and the context is large, even a “cheap” turn on Haiku becomes expensive because Haiku is still reading the full accumulated context. At that point, the right move is often to start a fresh session with a summary of what was decided, rather than appending indefinitely to a bloated context. Fresh sessions are cheaper than aging ones regardless of model tier.
The earned insight
The model selection decision feels like a product quality decision, but it’s actually an information density decision. Opus’s advantage is handling uncertainty — when you don’t know exactly what you need, Opus figures it out better. When you do know, you’re paying for reasoning capacity you’re not using.
The fastest path to cutting Claude Code bills in half isn’t a specific technique. It’s calibrating what “knowing exactly what you need” looks like in practice and routing accordingly. Most tasks that feel ambiguous are actually well-specified once you’ve written out the requirements clearly — which means the act of writing the prompt clearly often shifts the task from “needs Opus” to “Sonnet is fine.”
That’s a feedback loop worth building. Forcing yourself to write a clearer spec both reduces model tier and produces better output, because the model is working from better inputs. The same discipline that makes Sonnet a reliable substitute for Opus on specified tasks also makes you better at specifying tasks.
The bill goes down. The output quality holds or improves. The mechanism is just being honest about what “ambiguous” actually means for each task.
As Claude Code matures, per-turn costs will likely keep falling and model capability gaps will narrow. But the routing logic above stays durable because it’s based on task structure, not current pricing. Understanding which tasks require genuine reasoning versus reliable execution is useful regardless of what the models cost — and building that judgment now means you’ll route well when the next generation of models reshuffles the tier list again.