The supervision paradox

Lars Faye’s essay “Agentic Coding is a Trap” hit a nerve this month, and it is worth understanding why before reacting to it. The argument is not the usual “AI makes you lazy” lament. It is sharper than that, and it is structural, which is why it is harder to wave away.

The paradox, stated plainly

Faye’s core claim is what he calls the paradox of supervision. Agentic tools only produce good outcomes when a competent engineer supervises them — reads the diffs, catches the wrong turn, rejects the plausible-but-wrong plan. But the act of delegating to the agent is exactly what erodes the skills that supervision requires. The better the agent gets, the more you delegate; the more you delegate, the less you practice; the less you practice, the worse you supervise. The tool that needs your judgment is also the thing quietly degrading it.

You can dismiss that as armchair speculation, except that the strongest evidence for it comes from Anthropic. In a study Anthropic published about its own developers, debugging skill dropped by 47% among the heaviest AI users. Anthropic’s own framing of the trade-off is direct: developers “may lean on AI to deliver quick results at the expense of building critical skills — most notably, the ability to debug when things go wrong.” When the company shipping the agent reports a measurable skill drop in the people using it most, the loop has stopped being hypothetical. The Hacker News thread on Faye’s piece is worth reading in full for the range of practitioner reactions, agreement and pushback both: news.ycombinator.com/item?id=48002442.

What makes the number persuasive is which skill it names. Not “writing boilerplate,” which nobody mourns. Debugging — the skill you reach for precisely when the agent has produced something subtly wrong and confident about it. The paradox is not abstract. It is: the capability that decays first is the one you need most at the exact moment agent output fails.

The honest counter-argument

The strongest rebuttal I have read does not deny the skill drop. It relocates the cause. The “Agentic Coding Isn’t the Trap — Supervising From Your Head Is” response argues that the failure is not delegation, it is unstructured delegation: engineers who supervise from memory and vibes rather than from tests, types, and an explicit spec. On that account the atrophy is real, but the cause is a missing harness, not the agent. Give the work a specification the agent and the human both check the result against, and supervision stops depending on whatever the engineer happens to still remember.

I find that partly convincing, and I want to be fair to it because it is the version of the optimistic case that is actually true. A sharp test suite genuinely does externalize judgment you would otherwise hold in your head; a typed interface genuinely does catch the class of agent error that human review is worst at. Structured supervision is real mitigation, not a slogan.

But it does not close the loop, and the gap is the interesting part. Writing a sharp spec for an unfamiliar subsystem is itself a skill — arguably the skill — and it decays the same way debugging does when an agent always does the first pass. The harness externalizes the checking. It does not externalize the judgment that tells you the spec is asking for the wrong thing. You cannot write the test that catches “this entire approach is wrong” if the approach was the agent’s and you only ever reviewed its plausibility.

What I actually do about it

The fix is boring, which is why nobody sells it. I do not have a framework. I have three habits, and I keep them because they are cheap and the failure mode is expensive.

I read every diff I apply, line by line, before it lands — not as a rubber stamp but as the place I am still actually practicing. Reviewing correct, idiomatic, working code written to solve a problem I already understand is close to an ideal training set, and it is free every time I would otherwise have skimmed and clicked through. Skipping that read is, mechanically, where the 47% comes from.

I keep one category of work the agent does not touch. For me it is the first debugging pass on a genuinely confusing failure — the single skill the Anthropic number says goes first. The rule has to be categorical, not “when I have time,” because “when I have time” loses to every deadline. There is nothing to negotiate in the moment if the rule is absolute.

And I treat “the agent could do this” and “I should let the agent do this” as different questions. The first is about capability and is trending toward always yes. The second is about which muscle I am willing to stop using. The 2.3-tool stack makes the second question harder, not easier — every additional agent is one more place the answer quietly defaults to yes — and a fourth agent in the stack is a cost the crowded agent race systematically under-counts. The concrete, team-level version of these habits is in Using AI coding tools without letting your skills atrophy.

There is one more pressure worth naming. If Opus 4.7 really has moved the quality bar far enough that delegating-by-default gets more tempting — and the stack reshuffle argues it has — then the supervision tax goes up at exactly the moment the agent is good enough that you stop wanting to pay it. A better model does not relax the paradox. It tightens it.

Faye’s title says trap. I would say tax. A trap is something you avoid; a tax is something you budget for and pay on purpose. The engineers who keep their edge over the next two years will not be the ones who refused agents, and they will not be the ones who trusted them blindly. They will be the ones who priced the supervision tax honestly and paid it every month, even in the months it would have been faster not to.

The paradox, stated plainly

The honest counter-argument

What I actually do about it

Claude Code