Tinker AI
Read reviews
intermediate 6 min read

Windsurf Cascade for multi-step tasks: keeping the agent on track over 30 minutes

Published 2026-04-08 by Owner

Windsurf’s Cascade can autonomously execute tasks that take 20-40 minutes of agent time. The promise is real. The reliability depends almost entirely on how you structure the initial prompt. Cascade with a clear prompt finishes the task; Cascade with a vague prompt produces a wandering session that ends nowhere useful.

Here’s the structure I’ve found works for long Cascade tasks.

The shape of a good prompt

A long-task prompt has four parts:

  1. Goal — what success looks like
  2. Constraints — what shouldn’t happen
  3. Acceptance criteria — how you’ll know it’s done
  4. Resources — files, commands, docs the model should use

Skip any of these and Cascade improvises in the gap. The improvisation usually goes wrong in non-obvious ways.

Goal

The goal sentence is one or two sentences describing what the finished state looks like. Examples:

Bad goal:

Add user profiles to the app

Better goal:

Add user profile pages at /users/[username] that show display name, bio, avatar, post count, and a paginated list of the user’s posts (10 per page). Pages should be statically generated for the top 1000 users at build time and incrementally generated for others.

The bad version leaves Cascade to invent specifications. The better version commits to enough detail that Cascade has something concrete to aim at.

Constraints

Constraints prevent specific kinds of drift. Examples:

  • Do not modify the existing posts API; only consume it
  • Do not introduce a new state management library; use the existing Zustand setup
  • Tests should use Playwright at the page level, not React Testing Library at the component level
  • Use Tailwind utility classes; do not write CSS modules

These look like negative space. Their job is keeping Cascade from making default choices that don’t match your project. Without constraints, Cascade picks a sensible-looking option from its training data. With constraints, it picks the option you’ve already decided on.

The constraints I’ve learned to add by category:

  • State management: which library
  • Testing approach: which framework, what level
  • Styling: which approach
  • Error handling: throw vs return, custom error types
  • Logging: which library, what severity convention
  • Date handling: date-fns vs dayjs vs Temporal
  • Form handling: which library, controlled vs uncontrolled
  • API client: fetch vs Axios vs ofetch

This is a lot to specify. The payoff is that Cascade produces code that fits the project on first attempt instead of producing code you have to reshape.

Acceptance criteria

Acceptance criteria let Cascade know when it’s done. Without them, the agent often stops at “looks plausible” or keeps going past “actually done.”

Example for the user profile task:

Acceptance:

  • Build passes (npm run build)
  • All existing tests pass
  • New tests cover the happy path of profile loading and pagination
  • Lighthouse mobile score on /users/john-doe is at least 85
  • Type-check passes (tsc —noEmit)

Cascade can run these checks autonomously. When all pass, it stops. When some fail, it iterates on the failures.

Resources

Resources tell Cascade where to look for context. Three categories:

Files to read:

Read these files before starting:

  • app/users/[id]/page.tsx (current implementation, will be replaced)
  • app/posts/[id]/page.tsx (similar pattern, follow this structure)
  • lib/auth/getCurrentUser.ts (used for the active user banner)

Commands to run:

Use these commands during work:

  • npm run dev (port 3000) for verification
  • npm run test for the test suite
  • npm run lint for linting

Documentation:

Reference docs/architecture/data-fetching.md for the data fetching pattern. Don’t deviate from it.

A real prompt I used

Here’s a prompt I gave Cascade for an actual task — adding rate limiting to an existing API:

Goal: Add rate limiting to all routes under /api/v2/. Limit is 60 requests
per minute per authenticated user, 10 per minute per unauthenticated IP.
Limits should apply globally across server instances using Redis as the
backing store.

Constraints:
- Do not modify the existing middleware ordering for /api/v1/
- Use the existing Redis client from lib/redis.ts; do not create a new one
- Rate limit tracking should not block requests if Redis is unavailable;
  log the failure and let the request through
- Errors should return 429 with a Retry-After header
- Do not add new dependencies if a Redis-based limiter can be written in
  ~50 lines

Acceptance:
- npm run build succeeds
- npm run test passes including new tests
- New tests verify: limit triggers at 61st request, header is set on 429,
  Redis-down case is graceful
- Manual test (you should do this): hit any /api/v2/ endpoint 61 times in
  60 seconds; the 61st returns 429

Resources:
- Read lib/middleware/auth.ts for the auth check pattern
- Read lib/redis.ts for the Redis client config
- Read tests/api/v2/posts.test.ts for the API test pattern
- Reference Redis-based sliding window algorithm; do not use fixed window

Notes:
- This is for production; correctness > cleverness
- Document the limit values in a constant; we'll tune later

Cascade ran this for 22 minutes. Output was correct on first attempt. I reviewed and merged.

What goes wrong without this structure

I tested the same task with a vaguer prompt: “add rate limiting to /api/v2/.”

Cascade produced something. The something:

  • Used a different Redis client than my project’s
  • Picked fixed-window limits instead of sliding window
  • Returned 503 instead of 429 (wrong status code)
  • Added a new package (express-rate-limit) instead of writing it inline
  • Didn’t handle the Redis-down case at all

Each of these is a reasonable choice in isolation. None of them matched my project. Reviewing and reshaping took more time than just writing it would have.

What good prompts buy you

The structured prompt above is ~300 words. Writing it took 5 minutes. Cascade ran for 22 minutes producing correct work. I reviewed for 10 minutes.

Total: 37 minutes for ~400 lines of working code.

The vague prompt was 10 words. Cascade ran for 18 minutes producing wrong-shaped code. Reshaping took 50 minutes (rewriting the Redis client integration, switching to sliding window, fixing the test patterns).

Total: 78 minutes for the same output, with worse quality and more frustration.

The 5 minutes of upfront prompt structure pays back at roughly 8x.

When to skip the structure

For short tasks (under 5 minutes), this structure is overhead. Cascade with a one-line prompt does fine on short tasks because the surface area for drift is small. The structure matters when the task duration crosses a threshold where Cascade has time to wander.

My personal threshold: if the task seems like it’ll take more than 10 minutes, write the structured prompt. If it seems like 1-3 minutes, just say what I want.

The judgment is rarely wrong. When I get it wrong (writing structure for a short task), the cost is 5 wasted minutes. When I get it wrong the other way (vague prompt on a long task), the cost is 30+ wasted minutes plus a frustrating clean-up. Asymmetric risk; bias toward more structure.