Triaging AI-flagged vulnerabilities

Published 2026-05-25 by Owner

A typical Glasswing-style scan against a moderately complex repository can return hundreds of high- or critical-severity findings in a single run. Anthropic’s reported 90.6% true-positive rate is high — but 9.4% false positives across a 10,000-finding batch is still 940 fires to put out. The job of triage is not to verify every finding; it is to route the queue so the work that gets done is the work that matters.

Sort by exploitability before severity

A CVSS-critical vulnerability in a code path that takes no network input is not the same kind of problem as a CVSS-medium one in your login handler. Before you spend a maintainer-week on a finding, route it by whether the affected code is reachable from an untrusted input. A first cut from a gh query — substitute your own label scheme; ai-finding is illustrative — sorts findings whose body references handler, route, or API paths to the top:

gh issue list --label "ai-finding" --json number,title,body \
  --jq '.[] | select(.body | test("src/(handlers|routes|api)/")) | {number, title}'

Findings deep in internal utilities sort below. This is a heuristic, not a proof — but it cuts the worked-on queue by enough to make the rest manageable.

Build a fast verification harness

The two-weeks-per-bug figure Anthropic cites is post-disclosure work — the time from “an open-source project receives a CVE report” to “a patch is released.” Before disclosure, a project still has to confirm or kill the finding, and that confirmation work compounds the maintainer load if every report requires a full reproducer. Build a thirty-minute reproducer scaffold for each finding category, so verification is template work rather than fresh thinking:

ISSUE=$1
mkdir -p triage/reproducers/$(date +%Y%m%d)-$ISSUE
cd triage/reproducers/$(date +%Y%m%d)-$ISSUE
git checkout -b verify-$ISSUE
cp ../../templates/repro-template.sh ./run.sh

A maintainer who can run ./run.sh against the affected commit and get a yes-or-no in half an hour is a maintainer who can clear ten findings a day instead of one.

Batch disclosure to maintainer capacity

The reason the Mozilla and Cloudflare numbers from the Glasswing report landed at all is that their internal teams paced disclosure to their own ability to absorb the queue. The same applies to your intake. If you operate a public bug bounty or coordinate disclosure with upstream projects, throttle the rate at which confirmed findings leave your queue. A rolling-window cap, enforced before disclosure, prevents an AI-driven inflow spike from translating into an unmanageable outflow:

THIS_WEEK=$(gh issue list \
  --label "ai-finding,disclosed" \
  --search "created:>=$(date -v-7d +%Y-%m-%d)" \
  --json number --jq 'length')
MAX_WEEKLY=20
[ "$THIS_WEEK" -lt "$MAX_WEEKLY" ] || { echo "disclosure cap reached"; exit 1; }

date -v-7d is macOS syntax; on Linux, use date -d '-7 days'. The cap is your project’s, not the finder’s. The reason Anthropic’s program has not collapsed under its own throughput is exactly that the partners control the disclosure rate.

Close the loop with regression tests

Every patched finding feeds a regression test in the same commit that fixes it. If you skip this step, the next AI scan re-finds the same class of bug, your queue refills, and the work you did the first time does not compound. The pattern is one regression test per finding:

ISSUE=$1
mkdir -p tests/regressions/security
cp triage/reproducers/$(date +%Y%m%d)-$ISSUE/run.sh \
   tests/regressions/security/test_$ISSUE.sh
git add tests/regressions/security/test_$ISSUE.sh
git commit --amend --no-edit

The amend keeps the patch and the regression test in a single atomic commit — easier to revert if the patch regresses, easier to read in git log six months later. The general security-mode framing for AI-assisted projects is in AI coding security modes; the pre-flight step of auditing your agent config for leaked secrets before any of this runs is in auditing MCP config for leaked secrets; the broader argument for why any of this matters is AI found the bugs faster than we can patch them; the release that triggered this guide is Anthropic’s Glasswing finds 10,000 critical bugs in a month.