Using the brainstorming skill to turn a half-baked idea into a written spec

Published 2026-05-11 by Owner

The most reliable failure mode in agentic coding is not the model writing bad code. It is the model writing the wrong thing with high confidence and finishing before you notice. You describe a feature, the agent picks an interpretation, runs for three minutes, and presents a working implementation of something you didn’t actually want. The code is fine. The spec was wrong.

This failure is invisible until it isn’t. The agent does not ask whether its interpretation is correct — it assumes the task is obvious and builds. By the time the implementation lands in your terminal, you are evaluating a completed thing rather than an open question. Correcting it means either accepting drift from what you wanted or starting over, which costs more than the original task would have.

The problem is not specific to bad models or bad prompts. It happens with good models and clear prompts because any prompt under-specifies the decision space. “Add a search box to the sidebar” is a clear instruction. It is also compatible with at least a dozen distinct implementations, and the agent will pick one without marking the choice as a choice.

The brainstorming skill exists to close that gap before implementation starts. It treats the first prompt as the beginning of a design process, not a trigger for execution.

One question at a time, and why that feels wrong

The first thing most people notice about the brainstorming skill is the pacing. It asks one question. Then it waits. Then another. This feels inefficient — most tools dump five questions at once so you can fill them all out in a single pass.

The single-question approach is intentional. When a tool asks five questions simultaneously, you tend to read all five before answering, mentally bundle the answers, and move on. This is efficient but it compresses your thinking. You answer what you thought the questions were asking, not what they were actually asking.

Answering one question forces a complete stop. You read it, consider it, and respond before the next constraint arrives. By question three, you are often working through something you thought was settled but turns out wasn’t. The slower rhythm is a feature, not a limitation.

The other reason: follow-up questions depend on what you said. If the first question is “what problem are you solving?” and you answer something unexpected, the second question needs to track that — it cannot be pre-written. A batch of five questions can’t do this. One question at a time can.

There is also a practical side effect: the back-and-forth feels more like a conversation than a form. Forms get skimmed. Conversations get thought about. Most of the value in the brainstorming session comes from thinking out loud rather than producing answers, and the single-question format keeps that space open.

The full pattern, step by step

The skill follows a fixed sequence. Each step is a checkpoint, not an optional stage.

1. Context exploration. Before asking anything, the skill reads recent commits, relevant files, and project structure. This gives it grounding so questions are specific to your codebase rather than generic.

2. Clarifying questions, one at a time. The number varies by task complexity, but a typical session is three to five exchanges. The questions surface scope, constraints, and tradeoffs you haven’t articulated yet.

3. Two or three approaches with tradeoffs. After the clarifying questions, the skill proposes approaches — not a single answer, not an exhaustive list. Usually two is enough; three if there is a genuinely distinct third option. Each approach states what it costs and what it gains. The framing forces the skill to articulate the tradeoff explicitly rather than picking a default.

4. Design presented in sections, approved per section. The full design is not dropped in one block. It is presented as sections (data model, API surface, failure modes, etc.) and you approve each before the next arrives. This is where hidden disagreements surface cheapest. Approving section by section also means you can redirect mid-design without discarding the whole thing — “the data model looks right, but let’s rethink the API surface before moving on.”

5. Spec committed to disk. When the design is approved, the skill writes it to docs/superpowers/specs/YYYY-MM-DD-<topic>-design.md. This is a hard artifact, not a chat message that scrolls away. Future sessions — including automated execution agents — can read the spec file directly rather than relying on conversation history.

6. Self-review pass. The skill re-reads the spec it just wrote and looks for placeholders, contradictions, and unresolved decisions. These get flagged before handoff. This catches the common case where a section says “TBD” or silently assumes something that conflicts with another section.

7. Handoff to you. The spec is yours to review. Changes at this stage cost minutes, not hours.

8. Invoke writing-plans. Once the spec is approved, the skill hands off to the writing-plans skill, which produces an implementation plan. From here, execution can start.

The pre-implementation gate — step 3 through 7 — is non-negotiable. No code runs until a spec exists and is approved.

A side effect of this structure: the spec document becomes a durable record of the reasoning, not just the decision. When the implementation is complete and a reviewer asks “why did we load the search index lazily?”, the answer is in the spec, not in someone’s memory of a chat conversation from two weeks ago.

When it feels too slow

The skill applies the same process to simple tasks as to complex ones. If you ask it to “add a search box to the sidebar”, it still explores context, still asks clarifying questions, still writes a spec. This will feel excessive for a task that seems obviously scoped.

This is intentional, and the reasoning is worth understanding: simple tasks are where unexamined assumptions cause the most waste. Complex tasks prompt careful thinking almost automatically. Simple tasks don’t — the apparent simplicity discourages scrutiny. “Add a search box to the sidebar” contains at least four unspecified decisions (what data it searches, whether results appear inline or in a panel, how it handles no-results, and what the empty state of the input looks like). None of these feel important until one of them is wrong in production.

The brainstorming skill treats simplicity as a reason to be careful, not a reason to skip the design step. If you genuinely want to bypass the process for a task you’ve fully thought through, you can invoke implementation directly — but the skill won’t short-circuit on your behalf.

A session that surfaced a hidden requirement

To make this concrete: a project needed a search feature added to a content directory site. The initial prompt was “add search to the tools listing page.”

The brainstorming session went roughly like this:

Skill: What should the search target — the tool names only, or also the descriptions and tags?

User: Descriptions and tags too.

Skill: Should results update as the user types, or only after they submit?

User: As they type.

Skill: The tool list is currently rendered at build time as static HTML. Live filtering as-you-type requires either a client-side index (loaded at page load) or a server endpoint. The static index means a one-time JS payload; the server endpoint means adding a runtime. Which direction fits the project’s current architecture?

User: We’re fully static. Client-side index.

Skill: The index would be generated at build time from the same YAML corpus the pages use. What should happen when no results match — hide the list entirely, or show a “no results” state?

User: Show a “no results” state.

Skill: One more: the index payload scales with the number of tools. At 200 tools with descriptions and tags, it’s roughly 40-60kb uncompressed. Is that acceptable, or should search be progressively loaded only when the input is focused?

User: Progressive loading sounds right. Let’s do that.

That last exchange was the hidden requirement. If the skill had proceeded directly to implementation after the first prompt, it would have loaded the full index at page load. The decision to load it on focus only was the right call, but it required a different implementation path — lazy initialization of the search library, a loading indicator during index fetch, and handling for the race condition where the user types before the index is ready. None of that would have been in an implementation that started without the question.

The rework cost would have been real. The clarification cost was one extra exchange.

This pattern repeats in almost every brainstorming session that involves a UI or data boundary: there is always one question that sounds like scope-setting but turns out to be architectural. The brainstorming skill’s sequential pacing is what finds it. A linear implementation path would have treated “add search” as a single well-specified task and moved on.

What the spec enables downstream

The committed spec document is not just a record of the decision. It feeds directly into the writing-plans skill, which reads it and produces a step-by-step implementation plan with file targets, test cases, and rollback notes.

This matters because the implementation plan is generated from the spec, not from a paraphrase of the conversation. The writing-plans skill can re-read the spec to answer questions during implementation. If the spec is precise about the lazy-loading behavior, the plan specifies where in the codebase that logic lives and what the test for it looks like.

Without the spec, you have a plan generated from what the agent thought was agreed. With the spec, you have a plan generated from what was actually written down. Those can be different things, and the difference shows up in the review step, not the planning step.

The brainstorming skill produces a spec. The writing-plans skill produces a plan. The execution skill follows the plan. Each step reads the artifact from the previous step, not the model’s inference about what happened earlier in chat.

That chain is the point. The brainstorming skill is the first link.

Limitations

A few things to know before relying on this heavily:

The quality of the spec depends on the quality of your answers. The skill can only surface what it can ask about. If you give short answers to clarifying questions, the spec will be underspecified. The process is as good as your participation in it. Answering “I’m not sure” is fine — it is better to capture the uncertainty explicitly than to let the spec gloss over it.

Context exploration is imperfect. The skill reads commits and files, but it doesn’t always find the right context for unfamiliar parts of a codebase. If the relevant constraint lives in a file that doesn’t surface naturally in recent changes, you may need to point to it explicitly at the start. A quick “the relevant logic is in src/lib/tools.ts” is more reliable than hoping the skill finds it on its own.

The pre-implementation gate works best when you hold it. The skill will not run implementation without an approved spec, but nothing stops you from taking the partially-explored idea and running a different agent without the gate. The discipline is yours to maintain. The value of the gate comes from treating it as a real boundary, not an optional mode.

Not every task needs a full session. For work that is genuinely mechanical — renaming a variable, fixing a typo, bumping a version number — the brainstorming process is overhead. The skill is for tasks where what you want is underspecified, not tasks where the change is unambiguous. Learning to distinguish the two is the main practical skill the tool builds over time.

A useful test: if you can write the complete spec yourself in two sentences before invoking anything, skip brainstorming. If you can’t, it’s worth the session.

For the common case — a feature or refactor where you have a rough idea but not a precise spec — the brainstorming skill does what it says. It turns a rough idea into a written spec before code runs. That’s the failure mode it prevents.

As agentic coding tools become faster and more autonomous, the value of the forcing-function at the front of the process increases rather than decreases. A faster agent that implements the wrong thing faster is not an improvement. The spec step doesn’t slow the work down — it moves the thinking to where it costs least.

The brainstorming skill is not a safeguard against a bad agent. It is a safeguard against a good agent that doesn’t yet know what you actually want. That distinction is what makes the pre-implementation gate non-negotiable rather than configurable: the failure mode it prevents is proportional to how capable the agent is, not inversely proportional to it.