Tinker AI
Read reviews

Outcome

12 of 13 tickets shipped; Cascade handled most CRUD and serializer work cleanly, struggled with custom permission and signal logic

11 min read

I used Windsurf as my primary editor for a 2-week sprint on an existing Django REST Framework service — an internal API for managing partner accounts, transactions, and reconciliation jobs at a small fintech client. About 14k lines of Python, 84% test coverage, ten months of active development.

This is a per-task breakdown of what Cascade — Windsurf’s agent — did well, where it fell over, and how the experience compared to my prior month of running the same project in Cursor.

The setup

  • Editor: Windsurf 1.4
  • Model: Cascade with Claude 3.5 Sonnet (Windsurf’s recommended default)
  • Codebase: Django 5.0, DRF 3.15, PostgreSQL, Celery, ~14k lines Python
  • My background: 4 years of Django, deeply familiar with this codebase
  • Sprint: 13 tickets, sized small to medium

The 13 tickets:

  1. Add /api/partners/{id}/notes endpoint (list, create)
  2. Add created_by and updated_by audit fields to Partner model
  3. Bulk import endpoint for partners (CSV upload)
  4. Filter transactions by date range with cursor pagination
  5. Custom permission: only partner admins can edit their org’s data
  6. Webhook signature verification middleware
  7. Reconciliation status report endpoint
  8. Refactor PartnerSerializer to handle nested addresses
  9. Add Sentry breadcrumbs to Celery tasks
  10. Email notification on partner deactivation (signal-based)
  11. Fix flaky test in test_reconciliation.py
  12. Performance: optimize the dashboard query (N+1 issue)
  13. Investigation: why are some webhooks delivered twice?

I tracked: Cascade usage, elapsed time, number of Cascade runs, and approximate token spend. Same approach as a sprint I ran with Cline last month, so the comparison is apples-to-apples.

What shipped

12 of 13. Ticket 13 (webhook investigation) didn’t ship, for reasons unrelated to Cascade.

The breakdown

Where Cascade was a clear win

Ticket 1: Notes endpoint. Standard DRF CRUD pattern. I gave Cascade a brief with FILES IN SCOPE listing the new files plus the existing Partner model and partners/serializers.py for reference. Output: a clean ViewSet, serializer, URL config, and migration in one diff. Reviewed and accepted with one small change (renamed a field for consistency with our naming).

Time: 25 minutes. Tokens: ~140k input. Cost: ~$0.45.

Ticket 2: Audit fields on Partner. “Add created_by and updated_by foreign keys to User on the Partner model. Update the admin and serializer to include them. Generate the migration.”

Cascade got the model, admin, serializer, and migration in one diff. The migration even handled the existing rows correctly (used null=True initially, with a separate data migration to backfill). Better than I’d have written it without prompting myself.

Time: 18 minutes. Tokens: ~95k input. Cost: ~$0.30.

Ticket 3: Bulk import endpoint. I gave Cascade a CSV sample, the existing Partner model, and a description of what should happen on errors. It produced an endpoint that streamed the CSV, validated rows, returned per-row errors, and committed in a single transaction. Worked first try.

Time: 50 minutes. Tokens: ~210k input. Cost: ~$0.70.

Ticket 7: Reconciliation report endpoint. Read-only endpoint aggregating data from three tables. Cascade produced a working endpoint with appropriate ORM optimizations (annotations, select_related). I tightened the cache invalidation and shipped.

Time: 35 minutes. Tokens: ~155k input. Cost: ~$0.50.

Ticket 8: Nested address serializer. Refactor an existing serializer to handle a one-to-many addresses relationship. Cascade did this cleanly, including the validation logic for partial updates that I would have spent time figuring out.

Time: 40 minutes. Tokens: ~180k input. Cost: ~$0.55.

These 5 tickets — the CRUD-shaped, pattern-following work — represent the strongest case for Cascade. About 3 hours of work that would have been 5-6 hours by hand. Clean output, minor edits, shipped without drama.

Where Cascade was useful but needed correction

Ticket 4: Date range filter with cursor pagination. Cascade produced a working filter and a custom pagination class. The pagination class had a bug where the cursor encoded a non-deterministic field, causing duplicates across page boundaries. I caught this in testing.

I asked Cascade to fix it with a precise description of the bug. The fix was correct, but I had to be the one finding the bug. Tokens spent on the iteration: another ~80k input.

Time: 75 minutes (45 + 30 for the fix). Tokens: ~250k input. Cost: ~$0.80.

Ticket 9: Sentry breadcrumbs for Celery. Cascade produced a wrapper that added breadcrumbs to all task execution. The wrapper was idiomatic but instrumented every task — including ones that ran 1000+ times an hour and were noisy in Sentry. I narrowed the wrapper to specific task types and asked Cascade to update.

Time: 50 minutes. Tokens: ~140k input. Cost: ~$0.45.

Ticket 11: Flaky test fix. This is the same kind of task that Cline struggled with on a different project last month, so I went in skeptical. Cascade’s first hypothesis was wrong (race condition), second was closer (test ordering issue from a fixture), third was right (a database fixture wasn’t being torn down between test classes).

The third try got there, but I spent 20 minutes evaluating each hypothesis. Faster than not having Cascade involved? Probably. By how much? Hard to say.

Time: 60 minutes. Tokens: ~190k input. Cost: ~$0.60.

Where Cascade fell over

Ticket 5: Custom permission for org-scoped editing. This was the worst Cascade result of the sprint. The requirement was: a user can edit their own organization’s data, but not other organizations’ data; superusers can edit anything; certain partner-admin roles can edit data within their assigned partner network.

Cascade’s first attempt was a basic IsOwner permission. I added context: “Partners belong to networks; some users are partner-admins of a network; that grants edit access to all partners in the network.”

Second attempt was closer but had a subtle bug — it allowed network admins to edit partners outside their network if they happened to also be staff users.

Third attempt fixed the bug but produced a permission class that was 80 lines long, much harder to reason about than what we needed. I rewrote it from scratch in 25 lines, using Cascade’s output as a sketch but not as the final code.

Time: ~2 hours including the rewrite. Tokens: ~320k input. Cost: ~$1.00.

This is exactly the failure mode I see when AI tools are given tasks that have multiple interacting rules. They produce code that handles each rule in isolation but composes them awkwardly. The fix is to do the design yourself and let AI fill in the implementation.

Ticket 6: Webhook signature verification middleware. Security-sensitive code where I want to read every line very carefully. Cascade produced reasonable middleware but with timing-attack vulnerabilities (string comparison instead of hmac.compare_digest). I noticed and asked for the fix.

The fix was correct, but the original output had a security bug I had to catch. For security-critical code, AI tools shouldn’t be the primary author even when they can technically produce something that looks like a working implementation.

Time: 45 minutes including review. Tokens: ~120k input. Cost: ~$0.40.

Ticket 10: Email on partner deactivation via Django signals. Cascade’s first attempt used a post_save signal that ran on every save, then checked if the deactivated field had changed by comparing to the database — a query on every save. I asked for pre_save with update_fields filtering. Better, but still wrong because some deactivations happen via raw SQL update from a Celery task that doesn’t trigger signals.

We eventually arrived at a service-layer function that explicitly fired the email regardless of where the deactivation came from, but the round trips burned time.

Time: 55 minutes. Tokens: ~210k input. Cost: ~$0.65.

Ticket 12: N+1 dashboard query. Cascade identified the N+1 (the easy part) and proposed a select_related and prefetch_related fix (the easy fix). The actual issue was deeper: the dashboard queryset was being filtered by a method that re-evaluated the relationship per row, and select_related didn’t help because the access pattern bypassed the cached relation.

I had to read the code and figure this out myself. Cascade’s initial diagnosis was confident and surface-level, which was worse than no diagnosis because it pointed me away from the real cause for the first 30 minutes.

Time: 90 minutes. Tokens: ~150k input. Cost: ~$0.50.

Ticket 13: Webhook duplication investigation

I didn’t ask Cascade for this one. The investigation required reading load balancer logs, correlating with database state, and forming a hypothesis about a system I was watching live. AI tools are not for this. I worked with the platform team and identified a misconfigured retry policy in our load balancer.

This took 4 hours over two days and didn’t produce shippable code (the fix was in infrastructure, not in our codebase). It’s the most senior-engineer-flavored task in the sprint and the one where AI tools were obviously not the right approach.

The totals

MetricValue
Tickets shipped12 of 13
Total Cascade runs~45
Total tokens~2.0M input, ~80k output
Total Cascade cost~$7.10
Estimated time saved vs no AI10-12 hours over 2 weeks

The cost is striking compared to my previous sprint with Cline, which was $92 for similar volume. The reason: Windsurf at $15/month covers the API access. I wasn’t paying per token. The $7.10 figure is what I estimated by counting tokens against retail Anthropic pricing — it’s not what I actually paid (which was the flat subscription).

Comparing to Cursor on the same project

Before this sprint, I’d been using Cursor on the same codebase for about a month. The honest comparison:

Cursor’s edge: Better codebase understanding for “find the right file” type tasks. The indexing is genuinely better. Cmd+I is a faster surface than Windsurf’s equivalent.

Windsurf’s edge: Cascade’s structured planning before execution catches more bad ideas before they become bad code. The “preview the plan before execution” UX is a real productivity win for medium-complexity tasks.

Roughly equal: Inline completions, raw model quality (both use Claude), basic chat.

Subjective preference: Windsurf’s UI is calmer; Cursor’s is busier. I prefer Windsurf’s, but this is taste, not capability.

For this kind of work — Django CRUD with occasional harder pieces — both editors deliver similar productivity. Within-noise differences. The differentiation between them is smaller than the differentiation between either of them and “no AI tools at all.”

What I’d do differently

Don’t use Cascade on multi-rule logic. Permissions, security, signal handlers — these need human design first, AI fill-in second. Cascade-first led to wasted iteration.

Use Cascade for the test scaffolding before the implementation. Write the test first myself, let Cascade implement against it. I did this once and the result was much cleaner than the other direction.

Set a per-task time cap. I burned 2 hours on the permission ticket because I kept thinking “one more round will get it.” A 45-minute cap would have triggered the rewrite earlier.

Read security code as if Cascade is junior staff. No matter how clean it looks, security code needs human-eye-on-every-line review. The timing-attack issue in ticket 6 would have been a real vulnerability in production.

The 12-of-13 shipped in 2 weeks is a good result. The differentiation between Cascade-shines and Cascade-struggles tickets matches the broader pattern across all AI coding tools: pattern-following work goes well, multi-constraint reasoning struggles, and security-critical work needs human ownership. Knowing this in advance shapes which tickets to send through Cascade and which to do head-down.