How Claude Code decides which skill to invoke (and how to influence that)

Published 2026-05-11 by Owner

Every Claude Code session with skills installed starts with the orchestrator looking at two things: the current task and the list of available skills with their descriptions. That’s the entire routing surface. The model reads the task, reads the skill descriptions, and decides which (if any) apply. There’s no registry, no keyword index, no explicit trigger logic — just a language model doing text matching against the descriptions you gave it.

Skills are defined in .claude/skills/ or in a skills registry, and each skill has a description field that appears in the orchestrator’s prompt. The orchestrator reads that list on every turn. With a single skill installed, routing is trivial. With 30 skills installed, the orchestrator is making a multi-way classification decision on every task, and that’s where it gets interesting.

If a skill never fires when it should, or fires when it shouldn’t, the fix almost always lives in a description change. But diagnosing which change to make requires understanding how the orchestrator is actually reasoning.

The description is the routing contract

When the orchestrator scans available skills, it’s reading each description as a natural-language specification of when that skill applies. A description like “Use this when reviewing pull requests” is a contract the model tries to honor. If the current task looks like PR review, the skill gets invoked.

The implication: description quality is the primary lever on routing reliability. The model won’t find a skill useful if it can’t match the description to the task, even if the skill would obviously handle it. This means:

A too-broad description (“Use for code tasks”) routes everything there, which breaks more specific skills
A too-narrow description (“Use only for reviewing React component PRs authored by the security team”) misses adjacent cases the skill handles fine
A description with overlapping language with another skill creates a routing tie

Routing ties are the source of most skill failures in larger installations. With 10-50 skills installed, near-collisions are common.

One thing that surprises people when they first look at how skills actually work: the orchestrator doesn’t run a separate “is this relevant?” check before invoking. It makes a single holistic decision. If it decides a skill applies, the skill runs in full. There’s no partial invocation, no “apply only the relevant part.” So a false positive — a skill that fires when it shouldn’t — has real cost: it adds a full skill execution to the conversation, potentially with tool calls and side effects. Getting the description right matters in both directions.

Why competing descriptions fight

Consider two skills with these descriptions:

review-pr:     "Use when reviewing pull requests"
audit-changes: "Use when reviewing pull requests for correctness"

Both trigger on PR-related tasks. The orchestrator has to pick one. Which one wins depends on the task phrasing. “Review this PR” is ambiguous — it could be either. “Check this PR for correctness” matches audit-changes more specifically, so that one probably wins. But “do a code review” is a coin flip.

The problem compounds when the skills aren’t actually equivalent. If review-pr does a quick pass and audit-changes does a deep security and correctness audit, you want consistent routing. Ties mean you get whichever one Claude happened to rank higher on that particular phrasing, which isn’t deterministic from the user’s perspective.

The practical failure mode: users notice that “sometimes Claude does a thorough review and sometimes it doesn’t.” They think the model is inconsistent. The model is consistent — the routing is the problem.

A less obvious version: a skill that almost never fires. If its description uses language that’s always dominated by another skill’s more-specific description, it becomes effectively unreachable. The skill exists, runs fine when explicitly triggered, but the orchestrator never reaches for it autonomously.

Naming and description conventions that reduce ambiguity

The three patterns that reliably improve routing:

Scope-specific descriptions. The description should describe what the skill does that no other skill does. Not “Use for code review” but “Use for security-focused code review, specifically OWASP threat modeling and STRIDE analysis.” The specificity tells the model when to reach for this vs. a general review skill.

Explicit scope boundaries. Adding “Use for X, not for Y” in a description pre-empts the ties. “Use when reviewing production-deployed services, not development branches” constrains the routing surface without requiring the user to phrase tasks differently.

Skill prefix namespacing. When you have multiple skills for a team or domain (payments-review, payments-deploy, payments-audit), the shared prefix clusters them visually in the skill list and also in the model’s reasoning. The model tends to recognize “these are all payments-domain skills” and route to the right one based on the suffix. Unnamespaced skills that happen to cover similar ground don’t get this disambiguation benefit.

A concrete example of scope-specific description vs. generic, for a security review skill:

# Too generic — will compete with every review skill:
"Use this skill to review code for issues."

# Scoped correctly:
"Use when conducting a security-focused review. Runs OWASP Top 10 checks,
STRIDE threat modeling, and secrets-in-code detection. Use instead of the
general review skill when the task involves authentication, authorization,
data handling, or cryptography."

The second version is longer, but only because it adds specificity — not filler. Every added phrase either narrows the trigger surface or pre-empts a collision with an adjacent skill.

What doesn’t help: making descriptions very long with rationales and context that don’t affect routing. A 300-word description with 200 words of background doesn’t route better than a 50-word description that covers the same routing surface. The model already knows what security means. Write descriptions like function docstrings: what it does, when to use it, and what distinguishes it from adjacent tools.

Order of preference when multiple skills could apply

When the orchestrator scores multiple skills as plausible candidates, the specificity of the match is the primary tiebreaker. A description that uses the same vocabulary as the current task (“security audit” matching a task that mentions “security”) beats a description that only generically covers the domain (“code quality review”).

Secondary factor: recency of mention. If the user mentioned a specific skill or workflow earlier in the conversation, that skill’s description gets elevated as context. A user who said “I want to use the payments-audit workflow” thirty turns ago has implicitly loaded that context.

There’s no documented priority ordering based on skill installation order. Empirically, the model doesn’t appear to have a meaningful lexicographic or index-based preference. Description specificity is the only reliable influence point.

The failure case to watch for: a skill with a very generic description that “wins” on most tasks because it generically matches everything, crowding out the more appropriate specific skills. This is almost always the review skill if it’s described as “Use for reviewing anything.” Narrowing it to “Use for pre-landing PR review, checking for correctness and style” fixes the crowding.

Another underappreciated factor: the phrasing used in the task matters more than the phrasing used in the skill name. A skill named security-audit doesn’t fire just because the word “security” is in its name — the orchestrator routes based on descriptions, not names. The name is for humans reading the skill list; the description is for the orchestrator. A skill named foo with a perfectly scoped description will out-route a skill named security-audit with a vague description, for security tasks.

When a skill stops triggering: a diagnostic walkthrough

Last month I debugged a skill called db-migration-check that had stopped firing. It was supposed to run whenever someone asked Claude to write or review a database migration. The symptom: users would say “write a migration for adding the user_id column” and Claude would proceed directly without invoking the skill, occasionally producing migrations that violated the team’s naming conventions and safety rules.

The description at the time was:

db-migration-check: "Review database migrations for safety and correctness"

The first diagnostic step was comparing this description to the task phrasing. Users weren’t saying “review my migration” — they were saying “write a migration.” The word “review” in the description implies the migration already exists. The orchestrator correctly didn’t apply a “review” skill to a “write” task.

The second observation: there was a separate review skill installed with the description “Use when reviewing code, including schema changes and migrations.” That skill was capturing all the review-sounding tasks in the migration domain, leaving db-migration-check with effectively no surface to match against.

The fix was a description rewrite:

db-migration-check: "Use when writing or reviewing database migrations. Enforces
naming conventions (snake_case, timestamp prefix), checks for reversibility,
and validates index coverage. Use instead of the general review skill for any
migration file."

Three changes: (1) “writing or reviewing” expands the trigger to cover the write path; (2) the specific conventions listed give the model confidence this is the right skill when those concerns are relevant; (3) the explicit “use instead of the general review skill” pre-empts the tie with review.

The skill fired consistently after that change. The naming convention violations stopped.

The diagnostic process in general:

Compare the description vocabulary to the actual phrasing users are using
Identify which other skill is “winning” on the tasks where yours should fire
Check whether the winning skill’s description is strictly more specific or just overlapping
If overlapping: add specificity to your description and/or add an explicit “use instead of X” clause
If your skill’s domain is a subset of a broader skill: narrow the broader skill’s description to exclude your domain

Step 2 is often the hardest. There’s no built-in routing log that says “I considered these skills and chose this one.” The practical approach is to look at which skill actually ran (it will be visible in the conversation), and compare its description to the task phrasing word by word. The match is usually obvious in hindsight.

One thing worth knowing: a skill that’s explicitly invoked with /skill-name bypasses the routing logic entirely. The orchestrator doesn’t re-evaluate the description; it runs the named skill. This is useful both for testing (you can verify a skill works correctly even when its description is broken) and for power users who know which skill they want. The routing problem is entirely about autonomous invocation — when the user phrases a task naturally and expects the right skill to fire without explicit mention.

Keeping routing healthy as the skill set grows

Adding a new skill always risks disrupting existing routing. The new skill’s description competes with all existing descriptions. If the new skill has a generic description, it may start capturing tasks that were previously handled by specialized skills.

The check I run after adding any skill: mentally simulate three or four tasks that should trigger existing skills, and ask whether the new description would also match those tasks. If yes, the new description needs narrowing before it goes in.

For teams with 20+ skills, a periodic description audit is worth scheduling:

List all skills and their current descriptions
For each skill, identify two or three tasks that should trigger it
Verify those tasks wouldn’t also match two or more other skills
Flag any skill that’s been in the list for 30+ days without appearing in conversation history — it’s either broken or redundant

The most common finding in this kind of audit: skills that were added to solve a specific one-time problem and were never scoped tightly. They sit in the list with generic descriptions, generating false positives on unrelated tasks, without anyone noticing because the false positives produce something vaguely relevant.

Where this is heading

Skill routing is a description-matching problem today because the orchestrator is a language model doing text comparison. That’s also why it’s improvable with description engineering — the same model that routes skills also understands the nuances you write into descriptions.

The practical implication: treat skill descriptions as maintained artifacts, not one-time setup. As task vocabulary shifts in a team and as new skills get added, routing collisions accumulate. A periodic review of which skills are firing and which aren’t — especially after adding new skills — catches routing drift before it becomes user-visible inconsistency.

The deeper shift is recognizing that skill descriptions are a form of configuration, not documentation. They don’t just explain what a skill does for the humans reading the list. They’re the mechanism by which the orchestrator decides what to run. Keeping them precise and non-overlapping is maintenance work with direct impact on reliability. The teams that get the most out of large skill installations treat them the same way they treat linter configuration: something that needs active upkeep, not just initial setup.