The agent that runs while you sleep

The interactive coding agent has a property we rarely name because it is load-bearing: a human is present when it runs. You type, it works, you read the result. Every safety habit we have built — review the diff, check the test, reject the bad suggestion — assumes that presence. Antigravity 2.0, which Google shipped on May 19, schedules agents to run in the background and fans work out to parallel subagents. Anthropic’s Routines have done the same since April. The human is no longer necessarily there when the agent runs, and most of our safety habits quietly stop applying.

What “scheduled” actually changes

Anthropic’s Routines, launched April 14 as a research preview, give a coding agent three kinds of trigger: a schedule (nightly, hourly, or a one-off future time), an API call (an HTTP POST to a per-routine endpoint), and a GitHub event (a pull request or a release). The work runs on Anthropic’s cloud, so there is no local terminal and no session you are sitting in front of. Antigravity 2.0’s scheduled tasks are the same idea inside Google’s environment. The shift is small to describe and large in effect: the agent’s run is now decoupled from your attention. It happens on a clock or a webhook, not on a keystroke.

The API trigger is the one that should make you sit up. A routine you can fire with an HTTP POST is a routine that anything else can fire — a CI job, a webhook from a third-party service, another agent. The work no longer originates with a person deciding to start it; it originates with an event, and events arrive on their own schedule, in volume, without asking. The interactive agent was a tool you picked up. A routine on an API trigger is a process running in your name, and the difference between those two things is the whole subject of this post.

I argued in parallel agents and the per-developer meter that running agents across repositories turns one developer into a small team whose output they are nominally responsible for reviewing. Scheduling is the next turn of that screw. A parallel agent at least runs while you are at the keyboard, even if you are not watching closely. A scheduled agent runs at 3 a.m. The output is waiting for you in the morning, and the only question that matters is whether you review it at 9 with the same rigor you would give a teammate’s pull request — or whether you skim it, because it is already done and re-doing the review feels like undoing work.

The supervision model assumed a watcher

This is where I keep landing. The supervision paradox is that the better the agent gets, the less carefully we supervise it — right up until the moment supervision was the only thing that would have caught the failure. Scheduling automates the inattention. You do not have to decide to stop watching; the schedule decides for you, because the entire point of a scheduled task is that you are not there. The interface is built to let you forget it is running, and forgetting is precisely the failure mode.

The honest version of the worry is specific. An agent that opens a PR overnight is only leverage if a human reviews that PR with full attention before it merges. The instant the review becomes a rubber stamp — because the work looks done, because there are six of them, because it ran while you slept and undoing it feels wasteful — you have not multiplied your throughput. You have automated the production of unreviewed code and called it productivity.

The steelman, which is real

And yet I run scheduled agents, and I would defend it. There is a whole category of work that is recurring, bounded, and genuinely suited to running unattended: the nightly dependency-bump PR, the doc-drift scan that flags when the README and the code disagree, the issue triage that labels and routes overnight so the morning starts sorted. Routines’ GitHub-event trigger fits this exactly — run on a release, regenerate the changelog draft, open a PR for a human to approve. I have a routine that does precisely that, and it saves me twenty minutes a release, because I know exactly what a changelog should look like before I open the PR and checking it takes thirty seconds. These are chores with a narrow blast radius and an obvious review surface, and handing them to a scheduled agent is the best use of the feature. The condition attached to that praise is the entire argument: the task has to be one you can bound in advance and review in seconds, not one you are hoping the agent figures out while you are asleep. The changelog routine works because the review is faster than the work; the moment that ratio flips — the review takes longer than reading the diff would have saved — the automation has stopped paying for itself and started borrowing against my attention.

The line that matters

So the distinction I draw is between scheduled-for-chores and scheduled-for-features. Scheduled-for-chores is a bounded, repetitive task with a small diff and a fast review — bump, scan, triage, draft. Scheduled-for-features is “work on the backlog overnight,” an open-ended instruction whose output you cannot predict and therefore cannot review quickly, fired off because the scheduler makes it easy. The first is what the feature is for. The second is how you wake up to a merged PR nobody actually read. The tell is the same one that applied to parallel agents: can you describe, before the agent runs, the exact review you will do when it is done? If yes, schedule it. If the answer is “I’ll see what it did,” you have scheduled a problem rather than a chore. And the model running these unattended jobs is increasingly not even one you chose — which is the next thing worth worrying about. The footprint and the meter behind all this are in the Flash that got expensive.

What “scheduled” actually changes

The supervision model assumed a watcher

The steelman, which is real

The line that matters

Claude Code