Tinker AI
Read reviews
intermediate 4 min read AI-assisted

Measuring your own AI-code share honestly

Published 2026-05-19 by Owner

“What percentage of our code does AI write” is the wrong question, and answering it precisely makes it more wrong, not less. This is the practical companion to the 60% claim, deconstructed: how to measure the thing that actually matters for your own team.

Why percent-of-lines is a vanity metric

A model-suggested line you accepted, a line you accepted and rewrote, and a line you accepted and later deleted usually all count the same in the pipeline that produces this number. The metric rewards volume and is trivially inflated by generated clients, migrations, and test fixtures — the code that was never the hard part. A number that goes up when you add a code generator is not measuring engineering.

Measure what survived review

Three numbers that resist gaming:

  • Share of merged-to-production PRs primarily authored by an agent
  • Revert and hotfix rate on those PRs versus human-authored ones, same window
  • Median review time per agent-authored PR, tracked as a trend

The first counts work that survived a reviewer, not characters that survived to a commit. The second tells you whether the speed is real or borrowed from future incident time. The third surfaces the supervision cost the headline number hides.

None of the three is gameable by writing more AI code, which is the point. You cannot inflate the share of PRs that shipped without a rewrite by accepting more completions, you cannot lower a revert rate by generating more, and review time goes up, not down, when the work is sloppier. A vanity metric rewards volume; these three reward code that held up. The cost is that they need a few weeks of history before they say anything — a metric you can read on day one is usually a metric that is measuring day-one effort, not durability.

A lightweight method

Adopt a commit-trailer convention. Agent-primary commits carry a trailer:

git commit -m "feat: add rate limiter

Assisted-By: agent"

Pick a fixed window and count agent-primary versus total:

git log --since=2026-04-01 --until=2026-07-01 --grep="Assisted-By: agent" --oneline | wc -l
git log --since=2026-04-01 --until=2026-07-01 --oneline | wc -l

Count all reverts in the window as a baseline:

git log --since=2026-04-01 --until=2026-07-01 --grep="^Revert" --oneline | wc -l

That raw number is not the agent revert rate by itself: a revert is agent-primary only if the commit it reverts carries the trailer, so cross-reference each revert’s target SHA against the trailer list before reporting it as agent-attributable. Note also that git’s —until is exclusive — it stops before the date you give — so use the day after your window’s last day to include it.

Join PR review duration to the trailer in whatever your forge exposes; the trailer is the key. The method is deliberately crude — a convention plus a few counts — because a crude number people trust beats a precise number people game.

One real gotcha: squash-merges and rebases rewrite history, so the trailer has to live on the commit that actually lands on the main branch, not only on a feature-branch commit that gets squashed away. If your team squashes, put the trailer in the PR-merge commit body and count there:

git log origin/main --since=2026-04-01 --until=2026-07-01 --grep="Assisted-By: agent" --oneline | wc -l

Measure the same branch the same way every quarter; a number you redefine each time is not a trend, it is an anecdote with a percent sign.

Interpret it without fooling your leadership

Report the three numbers together, never the first alone. “42% of production PRs were agent-authored, revert rate on them is 1.3x human, review time per PR is up 18%” is a true sentence a VP can act on. “60% of our code is AI” is a sentence that ends a conversation which should continue. If revert rate or review time climbs alongside the share, the velocity is partly a loan against future incidents — say exactly that, in those words, before someone repeats the flattering half on a call. The point of measuring is not to produce a number for a slide; it is to know whether the trade you are making is the one you think you are making.

For why the public versions of this number are built to flatter, see the 60% claim, deconstructed.