Anthropic announces Claude 3.7 Opus with deep reasoning capabilities

Anthropic released Claude 3.7 Opus this week. The model targets hard tasks where extended reasoning helps. For AI coding tools, this is a new option for the most complex work.

The numbers

Coding benchmarks (released by Anthropic):

SWE-Bench Verified: 64% (highest in production models)
HumanEval: 96%
LiveCodeBench: 52%
Aider’s polyglot benchmark: 69%

These are strong. The SWE-Bench number is notable; it’s the most realistic of the benchmarks, and 64% is a meaningful jump from prior bests.

Pricing

Opus 3.7 pricing:

Input: $15/M tokens
Output: $75/M tokens

About 5x the price of Sonnet. For tasks that benefit from Opus’s depth, the cost is justified. For routine tasks, Sonnet is the right choice.

Where this fits

For AI coding tools, Opus 3.7 makes sense for:

Cline plan mode on complex tasks
Aider architect mode on hard refactors
Cursor chat for architectural questions
Claude Code for multi-step debugging

Tools should route appropriately: Opus for the few hard tasks, Sonnet for most work. The cost is high enough that defaulting to Opus would be wasteful.

A specific test

I gave Opus 3.7 a few representative hard tasks:

Task 1: Multi-step debug of a race condition. Sonnet 3.7 found the issue in 4 turns. Opus found it in 2 turns. The cost was higher per turn but lower in total.

Task 2: Architectural refactor across 8 files. Sonnet’s first plan missed 1 file. Opus’s first plan was complete. Saved an iteration.

Task 3: Generic CRUD endpoint. Both produced fine output. Opus was overkill; Sonnet was the right choice.

Pattern: Opus shines on tasks that need careful reasoning. For tasks where Sonnet is sufficient, the price difference favors Sonnet.

The reasoning angle

Claude’s “extended thinking” capability is more pronounced on Opus 3.7. The model can spend more time reasoning before producing output. This:

Increases latency (10-30 seconds typical for hard tasks)
Improves quality on tasks that benefit
Adds tokens (and cost)

For interactive use, the latency is felt. For autonomous loops, the higher quality reduces iteration count, which is the more meaningful effect.

Tools’ adoption

Tools using Anthropic’s API can immediately use Opus 3.7. Specifically:

Aider: Add architect-model: claude-3-7-opus-latest to config
Cline: Switch to Opus 3.7 in the model picker
Claude Code: Use the model picker
Cursor: Available in the chat panel model selection
Copilot Business: Available via the model picker (may need org admin enablement)

For BYOK users, the upgrade is immediate. For subscription tools, depends on the tool’s implementation.

When to use Opus

A practical rule:

For work where you’d estimate 30+ minutes if doing manually: try Opus
For work where you’d estimate 5 minutes manually: Sonnet
For work where you’d estimate 30 seconds: Haiku

The cost difference is justified by the task value. Cheap models for cheap tasks; expensive models for expensive tasks.

For most engineers’ workflows, Opus is occasional. The bulk stays on Sonnet.

What this signals

Anthropic continuing aggressive model releases:

Sonnet 3.5 → Sonnet 3.7 → Opus 3.7 in less than a year
Each release with measurable improvements
Pricing strategies adapting (Opus expensive but justified for some tasks)

The pace puts pressure on competitors. OpenAI’s o-series, Google’s Gemini Pro thinking — all needed responses to Anthropic’s pace.

For users, the pace is positive. Capabilities improve faster than expected. The competitive dynamics keep prices reasonable.

Worth using?

For most engineers, occasionally. Default to Sonnet; reach for Opus on hard tasks. The price difference is real but justified for the right tasks.

For BYOK users, set up routing in your tools so Opus is available when needed. Don’t default to it.

For subscription tools, check if Opus is included or extra. Tool vendors are adjusting pricing for premium models; check your specific plan.

Closing

A meaningful model release. Strong on benchmarks; expensive but justified for the right work; available via the standard channels.

For users on Anthropic, this is a new tool in the toolkit. Use it where it adds value; fall back to Sonnet for the bulk.

For users on other providers, the indirect effect: continued competitive pressure on capabilities. Your provider’s next release will likely include similar improvements.

The model wars continue. Users benefit from the pace.