Anthropic extends prompt caching TTL to 1 hour as default

Anthropic announced an extension to their prompt caching feature: the default TTL goes from 5 minutes to 1 hour. For users of AI coding tools that use long contexts (large codebases pinned in chat, autonomous agent loops with accumulated history), this is a meaningful cost reduction.

What prompt caching is

When you send a request to Claude with a long prefix (system prompt + codebase + conversation history), Anthropic can cache the prefix internally. The next request that uses the same prefix reads from cache instead of recomputing.

Cached tokens cost ~10% of normal input tokens. For coding tools where the context (system prompt + project files + earlier turns) is reused across many turns, the savings compound.

Why TTL matters

Caching only helps if the cache is still alive when you use it again. The TTL — time to live — determines how long the cache persists.

5-minute TTL means: if you take a break (lunch, meeting, deep thinking), the cache expires. Your next message rebuilds context from scratch at full price.

1-hour TTL means: most natural breaks during a working session don’t blow the cache. Continuing a conversation after a 20-minute walk doesn’t cost extra.

For interactive coding workflows, the difference is significant. Most coding sessions have natural pauses; 5-minute TTL was constantly being wasted by these pauses.

Cost impact for AI tools

Rough numbers for typical workflows:

Cline / Cursor chat with pinned codebase context (~50k tokens):

Without caching: 50k tokens × ~$3/M = $0.15 per turn
With 5-minute TTL caching: ~$0.05 per turn (cache hit ~70% of turns)
With 1-hour TTL caching: ~$0.02 per turn (cache hit ~95% of turns)

For a 30-turn session, that’s:

No caching: $4.50
5-min TTL: $1.50
1-hour TTL: $0.60

The 1-hour TTL is roughly 4x cheaper than no caching, 2.5x cheaper than the previous default.

Aider with large repo map:

Similar dynamics. Most aider sessions reuse the same repo map across many turns; caching helps. Longer TTL helps more.

Claude Code with long-running session:

Long agent loops accumulate context. TTL extension means the accumulated context stays cached for longer. Significant savings on multi-hour sessions.

What stays the same

A few things this release doesn’t change:

Cache requires explicit markers. You opt in to caching by including specific markers in your request. The tools handle this automatically; users don’t see it.

Cache invalidates on prefix changes. A small change to your system prompt or pinned files invalidates the cache. New session, full re-cost.

Cache is per-API-key. If multiple users on the same team are using shared infrastructure, the cache is per their respective API keys, not shared.

Tools’ response

Tools using Anthropic’s caching:

Aider: Already uses caching where supported. The TTL extension propagates automatically.

Cline: Uses caching for system prompts and pinned context. Should benefit from the longer TTL.

Cursor: Their backend handles caching. Should pass through the longer TTL benefit to users.

Windsurf: Similar to Cursor’s setup.

Copilot: Multi-provider; their handling of Anthropic caching is opaque from the outside.

For users on tools that use Anthropic models, the saving should appear in your bills (BYOK) or be absorbed by the tool vendor (subscription). Either way, the underlying cost goes down.

What this signals about Anthropic’s strategy

Two interpretations:

Cost optimization for users. Anthropic is making their service more competitive on cost for the use cases that matter (long-context coding workflows). Lowering effective costs by 2-3x is meaningful.

Competitive pressure. Other providers (Google with Gemini, OpenAI’s caching, etc.) are pushing pricing down. Anthropic is responding to keep the cost calculation favorable.

Both can be true. The user benefit is real either way.

Is there a downside?

A few minor considerations:

Stale cache risk. A 1-hour TTL means stale cache stays around longer. If something has changed about how the model interprets a prefix mid-cache, you could get inconsistent results. This is theoretical; I haven’t seen it happen in practice.

Storage costs for Anthropic. Longer TTL means Anthropic stores more cached data for longer. They’re presumably eating this cost, but it doesn’t have to last forever.

Less incentive to optimize. With cheaper caching, users may not bother optimizing context size. Easier-to-write expensive prompts may proliferate.

These are minor. Net effect is positive for users.

For users on subscription tools: nothing to do. The benefit propagates through your tool.

For users on BYOK setups: nothing to change. Your existing config benefits automatically.

For developers building applications using Anthropic’s API directly: look at your caching strategy. The longer TTL means cache hits are more achievable. If you weren’t using caching, this is the time to add it.

A broader take

Pricing changes like this are easy to under-appreciate. The AI tooling space evolves rapidly on capability; pricing changes often pass without fanfare. But pricing improvements compound for users with high usage. A 2-3x cost reduction on long sessions, sustained over a year of work, is a real economic benefit.

For users heavily invested in AI coding tools, paying attention to these incremental improvements adds up. Six months ago, my Cline costs were ~$120/month. Today, with various caching improvements and model price drops, the same workflow runs around $60/month. The capability hasn’t dropped; the cost has.

The trajectory is welcome. Whether it continues depends on competitive dynamics, but the recent direction is “more capability for less money.”

What I’d watch

A few things to track:

Whether competitors (OpenAI, Google) match or exceed the TTL extension
Whether tools take advantage of longer TTLs in their context management
Whether the AI coding tool subscription prices drop in response to underlying cost reductions

The third is interesting. If model costs drop significantly, the subscriptions priced on assumptions of higher costs become relatively more profitable for vendors. Some of the savings may flow to users; some may be retained as margin. The market dynamics will play out.

For now, the change is welcome. Use AI tools, pay less, get more done. That’s the right direction for the ecosystem.