Copilot vs Cursor on the same feature: a head-to-head log

To get past my own confirmation bias on Cursor vs Copilot, I tried to build the same feature twice — once in each — and timed everything. The feature was a multi-step signup form with validation, on a fresh Next.js + TypeScript project. Same model under the hood for both (Claude 3.5 Sonnet via the picker for Copilot Business, Sonnet directly in Cursor). Different UX, different prompting patterns, different end results.

The premise of the experiment is admittedly artificial — building the same feature twice is unusual. But forcing both tools through identical work surfaces real differences that “I prefer X” anecdotes don’t.

The feature

A 3-step signup form:

Step 1: Email and password (with a password strength indicator)
Step 2: Profile info (name, role, company size)
Step 3: Review and submit

Requirements:

Form state persisted across steps (no losing data on Back navigation)
Per-step validation with inline errors
Final submit calls a /api/signup endpoint (mocked)
Loading states and error handling
Tailwind for styling

The setup

Same starting point both times: a fresh bun create next-app with TypeScript, Tailwind, App Router. Empty app/page.tsx. Same model (Claude 3.5 Sonnet). Same project structure conventions (a CONVENTIONS.md for both, copied across).

I rolled a die: Copilot first (showed 4), Cursor second.

Round 1: Copilot (130 minutes total)

Phase 1: Form scaffold (35 min)

I opened Copilot Chat and described the feature. The response was a wall of text — explanation of the structure, then a multi-component code block. I had to manually create the files (SignupForm.tsx, Step1.tsx, Step2.tsx, Step3.tsx, useSignupForm.ts) and paste the relevant code into each.

The code itself was reasonable. The friction was the manual file creation. Copilot Chat doesn’t write to your filesystem; it produces code blocks you copy.

Phase 2: Form state (40 min)

I needed the form state to persist across step navigation. I asked Copilot Chat: “Update the useSignupForm hook to persist state across steps.” Got a working hook with useState and a context provider.

Then a back-and-forth: the type safety wasn’t quite right (the form state had optional fields, and I wanted them required by the time we hit step 3). I had to explain this twice, manually pasting the relevant types each time. Three iterations to get the type narrowing working cleanly.

Phase 3: Validation and errors (30 min)

Asked Copilot to add Zod validation per step. Got working schemas. Wiring the validation into the form was another back-and-forth — Copilot’s first attempt validated on submit only, I wanted it on blur, third try got it. Inline completions helped here for the repetitive parts (each input field’s onBlur handler).

Phase 4: Styling and polish (25 min)

Tailwind classes, password strength indicator, loading states. Mostly inline completions, fast.

Outcome

130 minutes total. 6 commits. The final code worked but had some artifacts of the back-and-forth: a few unused imports, some places where the variable naming was inconsistent because different chat responses had used different conventions. About 20 minutes of cleanup at the end to make the diff look cohesive.

Total tokens: ~280k input, ~25k output. (Estimated; Copilot Business doesn’t expose token counts directly.)

Round 2: Cursor (90 minutes total)

Phase 1: Form scaffold (25 min)

I opened Composer (Cmd+I) and wrote a structured brief:

What I want:
A 3-step signup form with persisted state, per-step validation, mocked /api/signup submit.

Files to create:
- src/components/signup/SignupForm.tsx (main container)
- src/components/signup/Step1Email.tsx, Step2Profile.tsx, Step3Review.tsx
- src/lib/signup/useSignupState.ts (state hook)
- src/lib/signup/schema.ts (Zod schemas)

Constraints:
- Tailwind for all styling
- Final form state must be fully typed (no optional fields after step 3)
- Validation runs on blur, not just on submit

Done means:
- All three steps render and navigate
- State persists across step navigation
- Submit calls /api/signup with the typed payload

Composer produced a 5-file diff. I reviewed and accepted with one small change (renamed a function for naming consistency).

The output was directly close to final. No copy-paste, no manual file creation.

The state hook from Composer was 90% of what I needed. I used Cmd+K on the hook file to refine the type narrowing (I wanted the final-step type to be required-fields-only). Cmd+K handled this in one shot.

Phase 3: Validation polish (20 min)

The Zod schemas were correct, but the wiring needed a tweak — the original had validation triggering only on submit, I wanted on blur. Cmd+K on each step component, “validate this field on blur using the schema.” Two minutes per file, four files, eight minutes total. The remaining 12 minutes was reviewing the changes.

Phase 4: Styling and polish (30 min)

This phase was actually slightly longer in Cursor than in Copilot, because I went deeper into polish — the initial output had a more cohesive base, so I had room to add details (animations, focus states, the password strength indicator) that I didn’t bother with in the Copilot version.

Outcome

90 minutes total. 4 commits. The final diff was clean — all the related changes had landed in one Composer batch, then one Cmd+K-per-file pass for the validation tweak. No cleanup phase; the code was already cohesive.

Total tokens: ~210k input, ~18k output (visible in Cursor’s usage view).

What was actually different

The 40-minute gap between the runs broke down roughly as:

File creation friction (~15 min lost in Copilot). Manually creating 5 files and pasting code into them is not free. Cursor’s Composer writes the files directly.

Cohesion of the multi-file change (~15 min lost in Copilot). The Copilot output was generated piece by piece across multiple chat turns. Each piece was internally consistent but the across-piece consistency was looser. Cursor’s Composer plans the whole change at once, so naming conventions and type usage are uniform.

Iteration overhead (~10 min lost in Copilot). Each chat turn requires explaining context that Cursor’s Cmd+K already has. “What’s the type of formState here?” needs no answer in Cursor; Cmd+K reads the file. In Copilot Chat, I had to paste types repeatedly.

The remaining ~5 min was just luck and small choices.

What was the same

Some things I expected to be different and weren’t:

Inline completion quality. Both tools use Claude Sonnet for the chat-driven tasks but use their own completion models for inline tab-completion. Both inline systems felt similar in quality for this work. No clear winner.

Quality of the final code. Once both versions were polished, the runtime behavior was equivalent. There’s no “Cursor produces better code” effect at the level of the final result. The difference is in the path to get there.

Model knowledge of Next.js patterns. Both produced idiomatic App Router code. The model is the model; both tools surfaced similar patterns.

What surprised me

Three things I didn’t expect:

Copilot Chat’s verbosity worked against it. The wall-of-text responses felt like more value, but they introduced friction (more to read, more to copy, more inconsistency across responses). Cursor’s Composer presenting changes as a diff, not an explanation, was net faster to act on.

The polish phase was slightly longer in Cursor. Because the base was cleaner, I had more energy left for details. This is a marginal benefit but real — when the tool reduces friction in the early phases, you have budget for the later phases.

Token economics were similar. Despite spending 40 fewer minutes, Cursor used about 70k fewer tokens. Mostly because the conversation iteration in Copilot Chat re-sent context multiple times. Cursor’s Composer sent the codebase context once, then targeted Cmd+K updates were small.

What this experiment doesn’t say

A few things worth flagging:

This is one feature, in one project, by one person. Sample size of two runs of the same task. Don’t read the time difference as a general “Cursor is 30% faster than Copilot” — it might be 10% on a different task or 50% on yet another.

Both tools are good. The 40-minute gap doesn’t make Copilot bad. 130 minutes for this feature is faster than I’d be without either tool. The choice isn’t “AI vs no AI”; it’s “which AI workflow.”

Familiarity matters. I’d been using Cursor more recently than Copilot, so my Cursor prompts were more polished out of the gate. Some of the gap might close if I were equally fluent in both.

Different tasks favor different tools. This feature was multi-file and structurally well-defined — a Composer-friendly task. A single-function bug fix would probably show smaller gaps. A long iterative debugging session might favor either tool depending on the bug.

The honest takeaway

For multi-file feature work in a typical web stack, Cursor’s UX produces less friction than Copilot’s, mostly because it writes files directly and presents multi-file changes as a single diff. The model is the same; the tool around the model isn’t.

For inline completions on familiar code, the tools are roughly equivalent. If your work is mostly typing-with-tab-completion, you won’t notice much difference between them.

For chat-style discussions about code, Copilot Chat’s structured responses are sometimes more helpful for thinking through a problem; Cursor’s tighter feedback loop is better for executing on a decision.

If I had to pick one, today, for this kind of work: Cursor. The differences are small but consistent across the kinds of tasks I do most. If a team’s existing workflow is built around GitHub-integrated Copilot for valid reasons (PR review summaries, codespaces, enterprise governance), the gap probably isn’t worth switching for. Both tools work; both are real productivity gains over no tool. The differences are at the margins and the margins occasionally matter.

The feature

The setup

Round 1: Copilot (130 minutes total)

Phase 1: Form scaffold (35 min)

Phase 2: Form state (40 min)

Phase 3: Validation and errors (30 min)

Phase 4: Styling and polish (25 min)

Outcome

Round 2: Cursor (90 minutes total)

Phase 1: Form scaffold (25 min)

Phase 2: Form state refinement (15 min)

Phase 3: Validation polish (20 min)

Phase 4: Styling and polish (30 min)

Outcome

What was actually different

What was the same

What surprised me

What this experiment doesn’t say

The honest takeaway

Cursor

GitHub Copilot