Tinker AI
Read reviews
intermediate 6 min read

Skill or hook? Two extension points and how to choose

Published 2026-05-11 by Owner

Every Claude Code extension starts with the same question: should this happen automatically, or should Claude decide when it’s relevant? That question has a clean answer, and it comes down to where the code runs.

Skills run inside the model’s loop. The model reads the skill’s description, decides it applies, and incorporates the skill’s body as part of its instructions for that task. Hooks run in the harness — the runtime that wraps the model. Hooks fire on events and the model has no vote.

Get this wrong and you end up with automation that’s sometimes-on (when you wanted always-on) or always-on (when you wanted sometimes-on). Both are annoying in practice.

Inside the loop vs. around it

A skill is markdown. When Claude is working on a task, it reads skills’ descriptions and invokes the ones that match. The match is judgment-based — Claude decides “this looks like a code review, I should invoke the review skill.” The skill’s body then shapes how Claude approaches the task.

This makes skills good at behavior shaping. They change how Claude works when a certain kind of task comes up: the methodology it uses for debugging, the style constraints it follows during code review, the planning structure it applies before making architectural changes.

The description field of a skill is load-bearing. Claude reads it to decide whether to invoke the skill. A description like “use this skill for code review” is fine. A description like “use this skill when reviewing a PR, checking a diff, or auditing changes for correctness and security” is better — it surfaces more of the contexts where the skill applies and gives Claude more signal to match against. The body can be long; the description should be precise.

A hook is a configured event subscription in settings.json. It maps events like PreToolUse, PostToolUse, Stop, SessionStart, and UserPromptSubmit to shell commands. When the event fires, the command runs. Every time. Without exception. Claude cannot decide not to run a hook.

Hooks have two broad output modes depending on the event type. For PreToolUse, a non-zero exit code or specific output can block the tool call — the hook acts as a gate. For PostToolUse and session events, the hook output is fed back into the conversation as context Claude can read, but the original action already ran. Knowing this matters: blocking behavior requires PreToolUse; informational feedback can use PostToolUse.

The clearest summary: skills are suggestions Claude accepts when relevant; hooks are invariants the harness enforces regardless.

When to use a hook

If the sentence describing what you want starts with “every time” or “automatically when” — that’s a hook.

Hooks exist for the CI-style behaviors that need to be deterministic:

  • Lint on every file write. Run your linter after Claude writes a file. If it fails, the hook can block further tool use until the issue is fixed.
  • Audit logging. Record every tool call Claude makes. A PostToolUse hook writing to a log file runs on every tool use without Claude needing to remember to log.
  • Context injection. A SessionStart hook that reads a current sprint file and appends it to Claude’s context. Happens at the start of every session whether Claude thinks it’s relevant or not.
  • Blocking dangerous commands. A PreToolUse hook that rejects rm -rf patterns before Claude can execute them. This has to be in the harness — asking the model to be careful is not the same as blocking the call.

A concrete example. Linting on file write as a PostToolUse hook on Write tool events:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write",
        "hooks": [
          {
            "type": "command",
            "command": "bun run lint:content ${tool_input.file_path} 2>&1"
          }
        ]
      }
    ]
  }
}

When Claude writes a file, the hook fires, lint runs, and the result is fed back into the loop. Claude sees the lint errors and can fix them. This works reliably because it bypasses the model’s judgment entirely. Claude writes the file; the hook runs; that’s the contract.

Compare this to putting “run the linter after writing content files” in a skill. The skill might work nine times out of ten. The hook works every time. For linting, a 10% miss rate means broken builds slip through regularly. The hook makes the miss rate zero.

The key property: hooks are appropriate for things where “Claude forgot” is not an acceptable failure mode.

When to use a skill

If the sentence describing what you want starts with “when Claude is doing X, it should approach it by Y” — that’s a skill.

Skills are appropriate for situational guidance:

  • Code review methodology. When reviewing a PR, Claude should check for specific security patterns, follow a particular comment format, and look at test coverage first. This guidance is useful for review tasks; it would be noise the rest of the time.
  • Debugging methodology. When debugging a failing test, Claude should reproduce the failure first, then read the stack trace, then look at recent commits. This structure is helpful for debugging; loading it for every session wastes context.
  • Architecture patterns. When designing a new API endpoint, follow the project’s existing route structure, use the established error handling pattern, and check for similar endpoints before adding new ones.
  • Planning discipline. Before making changes to more than two files, produce a written plan and confirm scope. This kind of guardrail works well as a skill because Claude can recognize multi-file tasks.

Skills work here because they’re loaded at the right granularity. The model invokes the review skill when reviewing code. It doesn’t invoke it when fixing a typo. The judgment call is exactly what you want — some guidance only matters in specific contexts.

The economics support this too. Skills are loaded into context, which has a cost. A 300-word debugging methodology loaded for every session costs context budget even when Claude is writing documentation. Loading it only when Claude is actually debugging means the context budget goes toward guidance that’s relevant to the task at hand. More skills can coexist cleanly when each one fires only in its correct context.

The “this should happen automatically” tell

There’s a pattern I see often: someone wants linting to happen on every save, so they write a skill that says “always run the linter after writing files.” Then they’re surprised when Claude sometimes does it and sometimes doesn’t.

The tell is the word “always.” If you’re writing “Claude should always do X,” that’s not a skill — that’s a hook waiting to be configured. Skills are invoked by Claude when relevant; “always” is not a relevance condition, it’s a constraint on the harness.

The same applies to security rules. “Never run commands that delete files without confirmation” phrased as a skill is a suggestion. Phrased as a PreToolUse hook that inspects the command, it’s a gate. The model can drift from suggestions; the hook cannot be bypassed.

The inverse mistake also happens. Someone puts verbose, context-sensitive guidance into a hook’s shell command, and the hook runs it on every single tool call. Hooks don’t know what task Claude is working on. A hook that injects a 500-word security checklist on every file read is going to flood context with irrelevant noise 90% of the time. That guidance belongs in a skill that fires when the task warrants it.

Both mistakes come from the same confusion: not knowing where the judgment boundary is. Hooks are judgment-free by design. Skills are judgment-driven by design. Putting judgment-dependent content into hooks wastes context; putting always-required constraints into skills makes them unreliable.

A real pair of both

Here’s a case where both extensions work together. The task: Claude is reviewing code.

The skill provides the methodology — check security patterns, verify test coverage, format comments as inline suggestions. The model invokes it when it recognizes a review task. The skill might be 300 words of guidance about what to look for and how to format the output; it’s only loaded when relevant, so those 300 words don’t occupy context during unrelated tasks.

A hook handles the invariant — after Claude finishes a session (Stop event), run a script that confirms no unreviewed files were left staged. The model doesn’t decide whether that check runs; the harness runs it regardless.

A second hook, PreToolUse on the Bash tool, blocks any git push that hasn’t passed the project’s test suite. The review skill doesn’t need to tell Claude to run tests before pushing; the hook enforces it. Claude can’t push without tests passing even if it somehow decides to try.

The skill shapes the work; the hooks enforce the boundary conditions. Neither is doing the other’s job. Three extension points, each in its correct layer.

Practical setup

Skills live in .claude/skills/ as markdown files. The description field is what Claude reads to decide whether to invoke the skill — it should describe the context where the skill applies, not just what it does. A skill with a vague description gets invoked unpredictably; a skill with a sharp description gets invoked at the right times.

Hooks live in .claude/settings.json (project-level) or ~/.claude/settings.json (user-level). Project hooks check into the repo so the whole team gets them. User hooks are for personal workflow preferences you don’t want to impose on others.

Both support local overrides via settings.local.json, which you’d typically gitignore. This matters for hooks that reference local paths or personal tooling — configure them in settings.local.json rather than the shared settings.json, so the team gets the hook shape but not the local path hard-coded to your machine.

One more difference in practice: skills compound. A session can invoke multiple skills if several of their descriptions match the task. A debugging session might invoke both a debugging-methodology skill and a security-checklist skill. That’s intentional — skills compose. Hooks don’t compound in the same way; they run in configured sequence, not based on task relevance.

The configuration boundary

The distinction matters more as your Claude Code setup matures. Early on, everything feels like a skill. You write skills for things that should be automatic, they sometimes fire and sometimes don’t, and you add more words to the skill trying to make it more reliable.

The fix isn’t better skill wording. It’s recognizing that the behavior you want belongs in the harness, not in the model’s judgment. Move it to a hook and it becomes reliable. Leave it in a skill and it remains probabilistic.

Skills improve what Claude does when it decides to act. Hooks change what the harness does regardless of what Claude decides. Knowing which bucket a behavior belongs in is the clearest test of whether your extension point is correctly placed.

A useful audit question: if a new developer joined and asked “does X always happen in this project?” — can you point to a hook that guarantees it, or just a skill that usually triggers it? If the answer is “usually,” and “always” is what you need, that’s a gap in the hook configuration.

New hook event types show up as Claude Code’s API evolves. The ones currently useful for daily workflow are the tool-lifecycle events (PreToolUse, PostToolUse) and the session-lifecycle events (SessionStart, Stop). As the surface grows, the same mental model applies: is this about constraining Claude’s judgment, or shaping it?

The mental model is stable even as the API changes. More event types means more places to put deterministic automation — more hooks that can fire without model involvement. More skill infrastructure means richer ways to load context-appropriate guidance. Both are worth building out as your Claude Code configuration matures. Start with one hook and one skill, understand what each one does, and the right placement for future extensions becomes clear.