Tuning a Claude Code skill's frontmatter: getting the description right

Published 2026-05-11 by Owner

The skill isn’t being used. You wrote it, you tested it manually, it works — but Claude never reaches for it. Nine times out of ten the problem is a single field: description.

Claude Code uses skill descriptions as routing rules. When a task lands, Claude scans the available skill descriptions and decides which, if any, to invoke. A description that reads like a label (“Code review helper”) gives Claude nothing to match against. A description that reads like a trigger condition (“Use when reviewing pull requests or evaluating code diffs”) tells Claude exactly when to fire.

Getting this right is more consequential than most of the other frontmatter. The skill’s name matters for humans reading the config; the description matters for Claude actually using the skill.

The two failure modes

False negatives: the skill never triggers. This is the most common problem. The description is technically accurate but too abstract to match any specific task pattern. Claude reads “A helpful skill for code review tasks” and encounters the task “review this PR diff” — the semantic connection exists, but nothing in the description explicitly says when or for what. The skill sits idle.

Symptoms: you invoke the skill manually with /skill-name and it works perfectly, but Claude never self-selects it. The skill is correct; the routing is broken.

False positives: the skill triggers on everything. Less common, but more disruptive. A description broad enough to match any programming task (“Use for coding assistance and software development”) will match nearly every conversation in a dev context. The skill invokes on questions you didn’t intend it for, adds latency, and burns context on setups meant for a narrower use case.

Symptoms: Claude opens every conversation by invoking the skill, even when you’re asking something unrelated to the skill’s purpose.

Both problems have the same root cause: the description isn’t specific enough about the triggering context.

What makes a description route correctly

Three things that help:

Explicit trigger phrases. Starting with “Use when…” or “When the user mentions…” or “Invoke for tasks involving…” tells Claude the description is a condition, not a label. The model has seen enough of these to recognize the pattern and apply it as routing logic.

Concrete task vocabulary. “pull request”, “code diff”, “failing test”, “stack trace”, “benchmark run” — words that name specific artifacts. Abstract words like “improve”, “analyze”, “help with” match too broadly or not at all.

Scope boundaries. What the skill is NOT for can be as useful as what it is for. “Use when reviewing pull requests — not for general refactoring tasks” rules out the false positive case explicitly.

Anatomy of a working description

Take this description apart and the structure is visible:

description: "Use when given a stack trace or failing test output to diagnose root cause — not for general refactoring."

Trigger verb clause: “Use when given” — positions it as a condition, not a noun phrase
Artifact anchor: “a stack trace or failing test output” — concrete things that appear in conversation context
Intent: “to diagnose root cause” — narrows the task type, preventing overlap with other skills
Exclusion: “not for general refactoring” — explicitly rules out the most likely false positive case

Not every description needs all four parts. A skill with a very specific artifact anchor (“Use when a package.json diff is pasted to audit dependency changes”) may not need an exclusion clause because the anchor is already narrow enough. But when there’s any doubt about overlap with other skills, the exclusion pays for itself in avoided false positives.

Five rewrites

These are real descriptions taken from skills in various .claude/skills/ setups, with the problems they caused and the fixed versions.

Before:

description: "Code review helper"

Problem: “Helper” is a category label, not a routing condition. Claude won’t match this against a specific task.

After:

description: "Use when reviewing pull requests or evaluating code diffs for correctness, style issues, and logic errors."

Why it works: “pull requests” and “code diffs” are exact terms Claude will see in task descriptions. “Correctness, style issues, and logic errors” scopes it further so it doesn’t fire on general coding questions.

Before:

description: "Debugging skill for finding errors in code"

Problem: Too general. “Finding errors in code” describes half of all programming tasks. This skill was invoking on refactors, test writing, and dependency updates — anything that touched code with a potential error.

After:

description: "Use when given a specific error message, stack trace, or failing test output to diagnose root cause."

Why it works: Anchors on the artifact (error message, stack trace, failing test output) rather than the activity. Claude needs an actual error present to trigger this, not just any coding task.

Before:

description: "Testing skill"

Problem: Single-word domain label. Matches nothing and everything simultaneously.

After:

description: "Use when asked to write unit tests, integration tests, or test fixtures for an existing function or module."

Why it works: Lists the test types Claude should associate with this skill, and anchors on “existing function or module” — a precondition that filters out cases where the code doesn’t exist yet.

Before:

description: "Helps with performance problems"

Problem: “Helps with” is filler. “Performance problems” is somewhat specific but still matches too broadly — every slow loop or database query qualifies.

After:

description: "Use when profiling data, benchmark output, or a specific slow query is provided and optimization suggestions are needed."

Why it works: Requires profiling data or benchmark output to be present in the conversation. Claude won’t invoke this on speculative “could this be faster?” questions, only on tasks where measurement already exists.

Before:

description: "Use this skill when working on the codebase"

Problem: This was an actual description in a skill meant for onboarding to a specific repo. “Working on the codebase” describes every session, so the skill ran constantly.

After:

description: "Use when a developer is unfamiliar with this repository and needs to understand its architecture, conventions, or how to set up a local dev environment."

Why it works: “Unfamiliar with” is a state Claude can infer from questions like “where does X live?” or “how do I run the tests?” It sets a precondition that rules out experienced contributors who don’t need the onboarding context.

The `name` field is not a backup

A common misconception: if the description is vague, Claude will fall back to matching on the skill name. It won’t, or at least not reliably. The name is used for explicit invocation (/skill-name) and for display in listings. The description is the routing mechanism.

A skill named pr-review with the description “Code review helper” will not reliably trigger on PR review tasks. A skill named misc-helper with the description “Use when reviewing pull requests or evaluating code diffs” will.

Testing description changes

The fastest way to verify a description change is to use the skill’s own trigger condition as a test prompt. If the rewritten description says “Use when given a stack trace to diagnose root cause”, open a new session and paste a stack trace without mentioning the skill by name. If Claude self-selects it, the routing is working.

If Claude still doesn’t invoke the skill, the description might still be too abstract, or the skill may be in a location Claude isn’t scanning. Check that the skill file is in the right directory and that Claude Code’s skills configuration points to it.

Testing for false positives requires a different approach: use a prompt that should NOT trigger the skill. For a PR review skill, paste a question about naming a variable and check whether Claude invokes it. If it does, the description is too broad. Tighten the artifact anchor or add an exclusion clause, then retest both directions.

A useful property: once the description is right, it usually stays right. Skills don’t drift the way code does. The edge case is when the project itself changes — a new testing framework, a different PR workflow, a renamed artifact. Those changes are worth propagating back to the description, because the routing logic is only as current as the vocabulary the description uses.

Length and the description character limit

Descriptions have a practical ceiling of around 150-200 characters before they get truncated or lose clarity. The goal is precision, not exhaustiveness. A long description that lists every possible task is harder to pattern-match than a short one anchored on the two or three most specific artifacts.

The sweet spot: one “Use when…” clause, one or two concrete artifacts, one optional scope boundary. That’s usually 80-120 characters and routes more reliably than anything longer.

The bigger picture

Skills are only useful if they’re invoked at the right time. The description field is the only mechanism Claude has for making that decision autonomously. Getting it right isn’t configuration busywork — it’s the difference between a skill that’s part of Claude’s active toolkit and one that sits in the config doing nothing.

The pattern that works consistently: write the description as a trigger condition, anchor it on specific artifacts, and test it by using those artifacts as a cold prompt. Fix on first observation of silent failure, not after months of wondering why the skill isn’t being used.

Tuning a Claude Code skill's frontmatter: getting the description right

The two failure modes

What makes a description route correctly

Anatomy of a working description

Five rewrites

The name field is not a backup

Testing description changes

Length and the description character limit

The bigger picture

Claude Code

The `name` field is not a backup