Tinker AI
Read reviews
advanced 7 min read

MCP tools and Claude Code's tool-use loop: how they actually compose

Published 2026-05-11 by Owner

Every Claude Code session ships with a fixed set of native tools: Read, Write, Edit, Bash, Grep, and a handful of others. MCP servers add more tools on top of those. From the model’s perspective, all of them are just tools — there is no internal category or flag that separates “native” from “MCP.” The model sees a flat list of JSON schemas and picks from them by matching tool names and descriptions to the current task.

That sameness is both the strength of the design and the source of its main operational failure mode.

Native vs MCP: same call shape, different lifecycle

When Claude Code initializes a session, native tools are always present. They load before the first turn and persist for the session’s lifetime. MCP tools have a different lifecycle: Claude Code discovers them by connecting to running MCP servers at startup, reading the server’s tools/list response, and injecting those schemas into the tool list alongside the native ones.

The practical consequence is that native tools are guaranteed but MCP tools depend on what servers are running and what they expose. If a GitHub MCP server is not running when a session starts, the github tool family is simply absent — no error, just no tool. The model cannot call what it cannot see in its list.

There’s a second variation: deferred tools. Some Claude Code configurations (including the agent SDK running this very session) keep a subset of MCP tool schemas out of the initial prompt entirely. Instead they expose a single ToolSearch tool whose job is to fetch schemas on demand. The model calls ToolSearch with a query, gets back the schema for the tool it actually needs, and then calls that tool. The deferred tools never appeared in the original list — they enter the context only when requested. This is purely a tool-list-management technique, invisible to the task semantics.

The deferred pattern requires the model to know, at some level, that a tool might exist even though it is not visible. In practice this works because ToolSearch is itself always present and the system prompt describes what categories of tools can be loaded through it. The model asks for tools by intent (“I need a browser tool”) rather than by exact name, and ToolSearch resolves that to a concrete schema. Once the schema lands in context, the tool is callable for the rest of the session without another ToolSearch round-trip.

How the model picks between tools

The model does not reason about provenance. It does not think “this is an MCP tool, so it must go through the server.” It reads the combined tool list and picks the best match for the task using the same mechanism it always does: tool name and description similarity to the current intent.

This means naming matters enormously. A GitHub MCP server that names its search tool search will collide semantically with any other search tool that happens to be present. Well-designed MCP servers namespace their tools explicitly — github_search_code, github_create_pr, github_get_issue — so the model can distinguish them from each other and from native tools without ambiguity.

It also means there is no magic routing. If two tools could plausibly satisfy a request, the model may pick the wrong one. Descriptions and names are the entire signal. Native tools like Bash and Read have been tuned over many Claude Code versions to have clear, specific descriptions. A custom MCP tool with a vague description like “does file operations” will lose to a native tool with a concrete one almost every time.

There is one important asymmetry: when native and MCP tools both seem applicable, the model tends to favor whichever tool’s description is more specific to the current tokens in context. An MCP tool described as “read a file from the local filesystem by absolute path” will directly compete with the native Read tool — and the outcome depends purely on which description better matches the model’s current state. Anthropic controls the native tool descriptions and keeps them precise; MCP server authors often do not. This is the most common source of tool-selection confusion in practice.

The orchestration cost of a large tool list

Here is the failure mode that bites most MCP-heavy setups: every turn, the model processes the full tool list as part of its context. The schemas are not summarized or compressed — they are present in full, consuming tokens and attention.

A lean Claude Code session with native tools only has roughly 15-20 tool schemas in context. Add a Slack MCP server and you might add 8 more. Add a GitHub MCP, a Linear MCP, a Postgres MCP, and a filesystem MCP, and you can easily cross 60-80 tools. At 100+ tools, two things happen.

First, the token overhead becomes significant. Each tool schema is typically 200-600 tokens. 100 tools is 20k-60k tokens of tool definitions before the conversation even begins. That is meaningful context pressure on a 200k-token model.

Second, and more practically, the model’s attention on each individual tool degrades. Studies on long-context transformers consistently show that items buried in the middle of long inputs receive less reliable attention than items near the edges. A tool added in position 73 of 100 is less reliably called than a tool in position 12 of 20. When the model hallucinate-calls a tool that does not exist, or calls the right tool with wrong arguments, an overloaded tool list is the most common culprit.

Deferred loading directly addresses this. By keeping rarely-used MCP schemas out of the initial list and letting ToolSearch inject them only when needed, the active tool list stays small. The model reliably uses the 20 tools it actually needs for the current task rather than trying to keep 90 tools in mind simultaneously.

Working example: GitHub MCP + native tools in sequence

Here is a workflow that combines both layers with clear separation of responsibility.

The goal: review a pull request, apply suggested changes locally, run the test suite, and report back.

Step 1 — Fetch the PR diff via the GitHub MCP:

tool: github_get_pull_request
args: { owner: "acme", repo: "api-server", pull_number: 847 }

The response contains the PR description, the diff, and the list of changed files. This is data the native tools cannot reach — there is no Bash command that authenticates to GitHub’s API and returns structured diff JSON without writing wrapper code first.

Step 2 — Read the affected files with the native Read tool:

tool: Read
args: { file_path: "/repo/src/auth/middleware.ts" }

The GitHub MCP returned a diff, but the diff is a delta — it shows what changed, not the full file context. Read loads the current local state. Combining both gives a complete picture: what the PR says changed and what the file actually looks like now.

Step 3 — Apply changes with the native Edit tool:

tool: Edit
args: {
  file_path: "/repo/src/auth/middleware.ts",
  old_string: "...",
  new_string: "..."
}

The GitHub MCP has no local file access; it cannot write to the working tree. The Edit tool has no GitHub access; it cannot read PR metadata. The right tool for each half of the task is different, and the model uses both in sequence.

Step 4 — Run the test suite with Bash:

tool: Bash
args: { command: "cd /repo && bun run test --reporter=verbose 2>&1 | tail -40" }

The output comes back as text in context. The model reads it, decides if the tests pass, and either continues or reports the failure.

Step 5 — Leave a review comment via the GitHub MCP:

tool: github_create_review_comment
args: { pull_number: 847, body: "Tests pass locally. One issue in line 47..." }

The native tools have finished their work; control passes back to the MCP layer to close the loop with GitHub.

This sequence illustrates the natural division: MCP tools handle external service I/O (things with auth, API calls, structured remote data), native tools handle local filesystem and process operations. Mixing them in sequence produces workflows that neither layer could complete alone.

Practical implications for MCP setup

The most common misconfiguration is loading too many MCP servers by default. A coding agent session probably does not need a Slack MCP, a calendar MCP, and a Google Docs MCP active simultaneously with a GitHub MCP and a code-search MCP. Each loaded server taxes every turn.

A few structural choices help:

Use per-project MCP configs. Claude Code supports .claude/settings.json at the project level. A backend API repo probably wants a Postgres MCP and a GitHub MCP. A documentation site probably wants neither. Setting MCP servers per-project rather than globally keeps each session’s tool list lean.

Name MCP tools with full namespacing. If writing an MCP server, prefix every tool name with the service: slack_post_message, not post_message. The model’s tool selection improves and collision risk drops.

Prefer deferred loading for auxiliary tools. Tools that are used only occasionally — a Jira MCP for filing bugs, a metrics MCP for spot checks — are good candidates for deferred loading. The overhead of a ToolSearch call is one additional turn; the benefit is removing 10-30 schemas from the base context of every other turn.

Treat description quality as a first-class concern. The tool description is the model’s only signal for when to call the tool. A description that reads “useful for various GitHub operations” is much worse than “fetch the diff, metadata, and file list for a GitHub pull request by owner, repo, and PR number.” The second version tells the model exactly when to call the tool and what to expect back.

Audit what each MCP server actually exposes. MCP servers sometimes expose more tools than you expect. A database MCP might expose separate tools for read queries, write queries, schema inspection, and transaction management — four tools where you might have budgeted for one. Before adding a server to a project config, run claude mcp list-tools <server> to see exactly what schemas will land in the tool list.

Where this is going

The current MCP specification is still relatively new, and tool-list management is an area of active work. There are proposals for server-side tool filtering (letting Claude Code ask an MCP server for only the tools relevant to the current task), streaming tool schemas (loading schemas lazily as they become relevant), and tool versioning (so cached schemas stay valid across server restarts). None of these are widely deployed yet.

The deferred-loading pattern used in Claude Code’s agent SDK is essentially a manual version of what those proposals would automate. The fact that it works well in practice is evidence that the underlying idea — keep the active tool list small and load schemas on demand — is sound. Expect future MCP versions to codify it rather than require each runtime to implement its own ToolSearch mechanism.

For now, the mental model is: every MCP tool added to a session is a tax on every turn. Sometimes the tax is clearly worth it — a GitHub MCP in a session that’s largely about reviewing code is providing its value constantly. Other times the math does not work out. Understanding that the model treats all tools identically, and that the tool list is a shared resource across the whole session, makes those tradeoffs legible.