Claude Code subagents: when to dispatch and when to keep the main session

Published 2026-05-11 by Owner

The main session context in Claude Code is a resource. Every grep output, every file read, every back-and-forth reasoning step consumes it. Once a session gets dense enough, the model starts losing track of earlier details — not catastrophically, just subtly: it forgets a constraint you mentioned four exchanges back, or it re-reads a file it already processed. For short tasks this doesn’t matter. For a two-hour investigation across a large codebase, it matters a lot.

Subagents are Claude Code’s answer to this. A subagent is an independent Claude instance spun up with its own fresh context window. It receives a task prompt, executes it using whatever tools it has access to, and returns a single result back to the main session. There is no shared memory between the subagent and the main session — the main session sees only the finished output. The subagent’s internal reasoning, the files it read, the tool calls it made: all of that stays inside the subagent’s context and is discarded when it finishes.

That architecture has specific implications for how subagents are useful and where they fail.

The context protection use case

The most immediate reason to reach for a subagent is protecting the main session from noise.

Say you’re mid-session on a refactor, you have a clear mental model of the code under discussion, and you need to answer a tangential question: which files call this function? Without subagents, you’d run that search in the main session. If the function has 40 callers scattered across 12 files, the main session now contains 40+ lines of grep output, file paths, and maybe several read calls to check call sites. That content is largely irrelevant to the refactor — but it’s in context now, and it dilutes everything that matters.

With a subagent, the query runs in isolation:

Task: Find all callers of the function `resolveAffiliateUrl` in src/.
Return: A summary list — file path and line number only, no surrounding code.

The subagent does the search. It might read 15 files internally. The main session receives a compact list — say, 8 lines — and the main context stays clean.

This compounds over a long session. Every time you’d otherwise dump search output into the main context, a subagent handles it and returns a summary. After three hours of work, the main session contains useful reasoning and decision context, not accumulated grep junk.

A concrete failure mode to recognize: the session that started coherent but became unreliable after several large reads. You ask Claude to find where a type is used; it reads 20 files and returns them all. Then you ask a follow-up about the architecture; it gives an answer that ignores a constraint you discussed an hour ago. The constraint is still technically in context, but it’s buried under 300 lines of type-usage output. The session degraded not because the model forgot, but because the relevant signal got swamped by irrelevant volume. A subagent on that type-usage search would have returned a 10-line summary and kept the constraint visible.

The discipline required: write the task prompt so the subagent knows what summary format you want back. If you don’t specify, it’ll return a wall of raw output anyway, and you’ve gained nothing.

Parallel dispatch

The second reason to use subagents is throughput. The Agent tool can dispatch multiple subagents at once. They run concurrently, each with its own context window, and the main session waits for all of them to return before continuing.

This makes sense when you have independent investigations that would otherwise be sequential. A practical example from upgrading a dependency:

Instead of investigating tests, then docs, then usage sites serially — each one expanding the main context before you’ve finished — you dispatch three subagents simultaneously:

Subagent 1: Check src/tests/ for any test that imports or references
            the `affiliate` module. Return: list of files and what each tests.

Subagent 2: Check docs/ for any documentation that mentions the affiliate
            module's public API. Return: list of files and what they document.

Subagent 3: Find all production call sites for `getAffiliateUrl` outside
            of tests and docs. Return: file, line, call pattern.

Three concurrent context windows, three summaries back to the main session. The total wall-clock time is roughly the slowest of the three (not the sum of all three). And the main session receives three concise summaries rather than the merged output of all three investigations.

The parallelism benefit is real but has a ceiling. Tasks that are themselves fast don’t benefit much from parallelism. And you’re paying for three context windows instead of one, so there’s a token cost to weigh against the throughput gain.

Token cost matters here. Each subagent starts with the task prompt and runs its full tool loop before returning. A single subagent doing a thorough codebase search might use 20-50k tokens internally — none of which pollutes the main session, but all of which you’re paying for. Three concurrent subagents is three times that. For a 5-second grep you could have done inline, the overhead dominates. The parallel dispatch pattern makes economic sense when the investigations are non-trivial — at least several file reads deep — and when the time savings of running them concurrently is meaningful (blocked on slow operations, or the serial sequence would take many minutes).

When not to use a subagent

There’s a failure mode that’s easy to fall into: dispatching a subagent for work that isn’t actually separable from the main session.

Interactive design or architecture work. If the work involves back-and-forth — “what do you think about option A vs B, given what we’ve discussed?” — a subagent is the wrong tool. Subagents can’t ask follow-up questions. They run their task prompt and return a result. If the task requires the model to have context from the ongoing conversation, or to reason in multiple passes with user input in between, that has to happen in the main session.

Tasks where the main context is the value. The main session has accumulated reasoning about the current problem. Sometimes that accumulated context is exactly what’s needed to evaluate a result. A subagent that summarizes test failures doesn’t know about the architectural decision you made two hours ago that explains why those failures are expected. The summary will look like a problem when it isn’t. For these cases, having the main session handle the work — with all its context — produces better reasoning even if it expands the context further.

Short tasks. Spinning up a subagent has overhead: the task prompt, the tool initialization, the result return. For a task that would take three tool calls in the main session, the subagent overhead isn’t worth it. The threshold is roughly: if the investigation would generate more than 30-40 lines of output in the main context, a subagent starts paying for itself. Below that, just do it inline.

Tasks that need partial results. Subagents return one result. If you need to inspect interim findings and redirect mid-task, that doesn’t work with a subagent. The subagent runs to completion and returns. If the early findings would change what you investigate next, you need that to happen in the main session where you can redirect.

A specific trap: dispatching a subagent to “investigate the performance problem in the query layer and suggest fixes.” The subagent will investigate and suggest. But if the right response to what it finds is “actually, ignore the query layer — this is a caching issue at the API edge,” you can’t intervene mid-run. The subagent finishes, returns its query-layer analysis, and you’ve spent tokens on the wrong investigation. For exploratory work where the findings should steer the next step, stay in the main session where you can course-correct in real time.

Subagent types and their tradeoffs

Claude Code exposes different subagent configurations with different tool access and behaviors.

Read-only / Explore mode. A subagent with only file read and search tools — no ability to write files or run commands. This is fast and safe for pure investigation tasks. It can’t accidentally modify anything, which makes it reasonable to dispatch without as much caution about what it’s allowed to touch. Good default for the “find all callers of X” category of tasks.

General-purpose subagents. A subagent with the full tool set: reads, writes, shell commands. These can do more but carry the obvious risk: a misconfigured task prompt on a write-capable subagent can make changes you didn’t intend. Use these when the investigation needs to produce a file as output (a test run log, a summary written to a scratch file), not for casual exploration.

Custom / prompted subagents. A subagent launched with a specific system prompt or specialized role. This is where you’d configure a subagent to behave as, say, a “security reviewer” with a specific checklist, or a “test writer” that only produces test files. The specialization comes from the prompt, not a different model — it’s still the same underlying Claude, just with a narrower framing that tends to keep it on task. Useful when you’re dispatching the same type of investigation repeatedly and want consistent output format.

The general guidance: use the least-capable subagent that can do the job. Read-only for exploration, full toolset only when the task genuinely requires writes or command execution.

Writing task prompts that produce usable results

A subagent is only as useful as what it returns. The task prompt has two jobs: specify what to investigate, and specify what format the result should take.

Most task prompts written in haste nail the first and skip the second. “Check all the tests related to the affiliate module” is a valid investigation scope, but the subagent will return whatever it considers a complete answer — which might be a narrative explanation, a raw list, a table, or a mix. If you’re feeding the result into the main session’s reasoning, unpredictable format means unpredictable quality.

A better prompt structure:

Task: [what to investigate and scope boundaries]
Constraints: [what to include / exclude from the investigation]
Return format: [bullet list / table / one paragraph / file path list]
Return length: [number of lines or items maximum]

The return length constraint is particularly underused. Without it, a thorough subagent will return a thorough result — which may be 50 lines when you needed 8. Telling it “return at most 15 items, sorted by relevance” forces summarization and keeps the result proportional.

One more thing worth being explicit about: scope boundaries. “Check all the tests” is broader than “Check tests in src/tests/ only.” Without scope constraints, a subagent may crawl further than you intended, running up token costs and potentially reading files you’d rather it didn’t touch. The read-only subagent type provides a safety floor, but explicit scope in the prompt is still clearer about intent.

Good task prompts take 30 seconds longer to write than bad ones. The payoff is results you can actually use without cleaning up.

The main session is expensive to pollute and slow to recover from overloading. Subagents are cheap context windows that let you offload work that would otherwise dirty the main session.

The question to ask before dispatching: “Would I be comfortable if this subagent’s raw tool output appeared in my main session?” If yes, do it inline — the subagent overhead isn’t buying you anything. If no — if the output would be voluminous, noisy, or tangential — dispatch it and get a summary back instead.

The parallel dispatch case is a separate calculation: “Would I naturally do these three things in sequence?” If yes and they’re independent, dispatch them in parallel. The time savings compound on longer investigative sessions where you’d otherwise spend 15 minutes doing three serial searches.

What this still doesn’t solve: the main session’s own context growth from complex reasoning. Subagents handle the noise from external searches and tool calls, but the reasoning itself still accumulates. For sessions that are genuinely long, the better answer is a clean session with a tight scope, not a subagent-heavy session that tries to offset accumulated context. Subagents are useful, but they’re not a substitute for keeping the main task focused.

That said, once you’ve established the habit of reaching for a subagent for any non-trivial investigation, the main sessions stay coherent much longer. It’s one of the higher-leverage workflow changes available in Claude Code right now.