AI agents still can't do good frontend, and the reasons are interesting

I’ve watched Cline, Cursor’s agent, Windsurf Cascade, and Devin handle backend tasks reasonably well over the past year. Add an endpoint, write a service, refactor a controller — these go fine. Set the same agents loose on frontend tasks and the output is technically working but visually dead. The buttons are gray; the spacing is off; the typography is whatever the model defaulted to; the interactions feel like a Bootstrap demo from 2014.

The gap is consistent across tools. The reasons say something interesting about what frontend work is.

What “frontend work” actually involves

Frontend work, broken down honestly:

Writing JSX/HTML/CSS that compiles
Wiring up state and event handlers
Connecting to data sources
Making it look right
Making it feel right (interactions, transitions, micro-animations)
Making it work on different devices and browsers
Making it accessible

Items 1-3 are code. Items 4-7 are design and UX work that happens to be expressed in code.

Agents handle items 1-3 well. The shape is similar to backend work — write the code, it compiles, it runs. Items 4-7 require a different kind of judgment.

Why visual judgment is hard for agents

The agent has to look at its output, decide if it looks good, and iterate. None of these steps work the way the agent loop assumes:

Looking at output. The agent’s “look” is rendering an HTML preview and reading the DOM. It can’t see the rendered page the way a user sees it. Tools like Playwright + screenshot help but agents don’t natively reason about images. Vision models help; they don’t yet match design judgment.

Deciding if it looks good. Even with a good visual model, the model has to compare the output against an aesthetic standard. The standard exists implicitly — it’s “what the team’s brand looks like” or “what good design looks like in 2026.” The model’s training has examples of design but no explicit standard.

Iterating. When the model decides “the spacing is off,” the iteration is “change padding from 16 to 24 and re-render.” This loop costs time and tokens. After three or four iterations, the result is usually “okay” but not “good.”

The compounding effect: each step is approximate, and the approximations compound across the loop. A backend agent’s 90% correct output, refined twice, gets to 99% correct. A frontend agent’s 75% acceptable output, refined twice, gets to “still 75%, but spent more tokens.”

What “good frontend” looks like that agents miss

Some specific things I see agents miss reliably:

Visual hierarchy. Importance differentiation through size, weight, color, spacing. An agent will use the design system’s variables but won’t apply them with judgment. The most important element doesn’t visually dominate.

Spacing rhythm. Consistent vertical rhythm, balanced negative space, related elements grouped tightly while unrelated ones get breathing room. Agents tend toward uniform spacing — 16px between everything — which produces a “list of cards” feel even when it’s not a list.

Color usage. Brand colors used as accents on important elements, not splashed everywhere. Backgrounds and surfaces in restrained palettes. Agents tend to use the design system’s colors without restraint, producing pages where everything is loud.

Interaction polish. Hover states, focus rings, loading states, error states, empty states. The states the user spends most of their time in get less attention than the happy path. Agents produce the happy path; the secondary states are perfunctory.

Typography. Font sizes that match a clear scale, line heights that breathe, weights that emphasize without screaming. Agents pick “reasonable” defaults that produce technically-fine but visually-flat pages.

Micro-animations. A button that transitions smoothly when hovered, a card that lifts on selection, a modal that fades in instead of flashing. These are 5-line CSS additions that take a page from “functional” to “polished.” Agents skip them.

Why it’s not a model capability problem

The hypothesis “the model just isn’t smart enough” is wrong. The flagship models are perfectly capable of generating good frontend code if asked specifically. Ask Claude or GPT-4o to “generate a button with a smooth hover transition, a focus ring, and loading state” and you get good code. The capability exists.

The problem is that agents have to know to ask for these things. Without prompting, they don’t. They produce a “button” with the basics and move on.

This is a planning problem, not a generation problem. The agent’s plan for “build this UI” doesn’t include “and then polish it.” A human designer’s plan does.

What’s been working

Three patterns I’ve seen produce good agent-driven frontend:

Tight design systems. When the design system has all the necessary tokens (spacing scale, color scale, type scale) plus components for common patterns, the agent’s defaults are good. The agent picks <Card variant="primary"> and the card is well-designed because the component is well-designed. The agent doesn’t need to make design decisions; it composes pre-made decisions.

Reference designs as prompts. Showing the agent a screenshot or Figma of the target and asking it to match. Not “build a settings page” but “build a settings page that looks like this.” The visual reference grounds the model in a specific aesthetic.

Explicit polish steps. A second agent pass specifically for polish — review the output, identify three specific things that aren’t quite right, fix them. The split between “implement” and “polish” makes the polish step a first-class objective rather than implicit.

The combination of all three produces frontend output that’s much better than any single one. None on their own is enough.

Where this leaves frontend developers

Frontend developers reading this should not relax. The category that’s safest is “people who can implement working UI.” That’s where agents are competitive.

The category that’s hardest for agents is “people who decide what good looks like.” Visual judgment, taste, and design decisions remain a human bottleneck. Frontend developers who’ve cultivated this side of the work are more valuable than ones who haven’t.

The category that’s getting harder for everyone: pure component implementation. If your value prop is “I can write a React component to spec,” agents can do that, and the spec writers (designers, PMs, you) are the ones in demand.

The shift is not “frontend is dead.” It’s “the implementation side of frontend is being commoditized while the judgment side becomes more valuable.” Frontend roles that emphasize judgment (design engineering, brand work, UI craft) are stable. Frontend roles that emphasize implementation are getting compressed.

What I’d build differently

If I were starting a frontend-heavy product today, knowing what I know about agent capabilities:

Invest heavily in the design system upfront. The agent’s defaults are only as good as the system’s defaults.
Standardize on a visual reference library (Storybook, Chromatic) so agent work has visual ground truth.
Treat polish as a separate phase, not an integrated one. Build the functional UI fast with agents; polish with a designer or a polish-specialized agent pass.
Use vision models in the loop. “Take a screenshot of this page; tell me three things that look off” is a useful agent prompt. The agents can self-critique reasonably when given the visual.

Agents won’t replace good frontend taste any time soon. They will reshape the workflow around the parts they can handle. Get used to spending less time on implementation and more time on judgment.