Three months of Cursor on a production Rails app: the honest numbers

I spent three months running a controlled-ish experiment on myself. The codebase: a Rails 7 monolith, roughly 180k lines, 6 years old, three previous engineering teams, the usual archaeology situation. The question: does Cursor actually make me faster, and on what kinds of work?

I tracked task start-to-done times for 63 tasks over 12 weeks — first 4 weeks without Cursor, then 8 weeks with it. I also noted subjective difficulty, task type, and how much of the codebase each task touched.

This is not a rigorous study. It’s one person, one codebase, no control group. Take it as a data point, not a verdict.

The setup

I used Cursor Pro ($20/month) with Claude 3.5 Sonnet as the primary model. I used the chat panel for anything requiring multi-file reasoning and Cmd+K for inline edits.

My baseline: I’ve been writing Ruby and Rails for 8 years. I type fast, I know the codebase reasonably well, and I’m not the target audience for “AI will replace developers” takes. I’m the person the tool has to beat.

Task types I tracked:

Greenfield feature: new controller, service object, or migration with minimal coupling to legacy code
Bug fix: something broken, usually with a test failing
Legacy refactor: changing something old and coupled to other things
Test coverage: adding tests to untested code
Spike: exploring an unfamiliar part of the codebase to understand it

The numbers

After normalizing for task complexity (I rated each task 1–5 difficulty before starting):

Task type	Tasks tracked	Avg time before	Avg time with Cursor	Delta
Greenfield feature	18	2.4 hours	1.55 hours	-35%
Bug fix	14	1.1 hours	0.95 hours	-14%
Legacy refactor	12	3.8 hours	3.9 hours	+3% (noise)
Test coverage	11	1.6 hours	0.85 hours	-47%
Spike	8	1.9 hours	1.4 hours	-26%

The headline number that impressed me: test coverage. Writing tests for untested Rails code is a specific kind of tedious work — you know what the test should say, but writing 15 expect() calls for every possible edge case is slow. Cursor handles this extremely well because the pattern is repetitive and well-defined.

Greenfield features saw real gains too. When I’m writing new code with clear requirements, Cursor’s ability to scaffold a service object or write a migration from a description saves 30–45 minutes of typing.

Where Cursor didn’t help

Legacy refactoring was essentially unchanged. Here’s why:

The model doesn’t know what I know. Our codebase has 6 years of tribal knowledge encoded in method names, comments, and implicit contracts between services. Cursor sees the text; it doesn’t see the history of why something is done the way it’s done.

Rails magic is context-dependent. The ORM callbacks, concerns, and class-level macros that make Rails powerful are the same things that confuse the model. When I asked Cursor to help refactor a model with 12 callbacks and 4 concerns, the suggestions were syntactically valid and semantically wrong.

Confident wrong answers are worse than slow right answers. The model is confident. It doesn’t say “I’m not sure if this callback fires before or after validation.” It picks one and writes the code. In legacy systems where subtle ordering matters, confident wrong answers create bugs that are hard to find.

The cognitive overhead cost

Something the productivity numbers don’t capture: reviewing AI-generated code is its own cognitive load.

I review every Cursor suggestion before accepting it. For greenfield code, this takes 30 seconds — it’s usually right or close to right. For legacy code, careful review takes 5–10 minutes per suggestion, because I need to verify correctness in context. When I add that time back, the apparent gains on legacy code evaporate.

The implication: if you’re not reviewing carefully, you’re not saving time — you’re accumulating technical debt and bugs at an accelerated rate.

What I changed in my workflow

After the first 4 weeks with Cursor, I made two adjustments that improved results:

I stopped using it for tasks I didn’t fully understand. If I can’t describe exactly what a function should do before writing it, Cursor’s suggestions lead me somewhere plausible-but-wrong. The discipline of specifying the task first — which Cursor requires to do good work — turns out to also clarify my own thinking. I now treat “I’m not sure what this should do” as a signal to think before prompting, not a reason to ask Cursor.

I use it aggressively for test-first workflows. Write the test by hand, let it fail, then ask Cursor to make it pass with the constraints I specify. This produces better code than asking Cursor to write both the test and the implementation, and it keeps me in control of what “correct” means.

Should you try it?

On a greenfield project or for a developer new to a codebase: yes, Cursor is worth the $20/month. The productivity gains on fresh code are real.

On a legacy codebase that you know deeply: the gains are smaller than the marketing suggests. You’ll likely see real improvement on testing and spikes, near-zero improvement on the hard refactoring work. That might still be worth $20/month, but go in with accurate expectations.

The biggest risk is overconfidence in the output. Cursor writes code that looks right faster than you can write code that looks right. Looking right and being right are not the same thing, and the gap matters more in a system that’s been running in production for 6 years than in a fresh repo.