I built a small Flutter app for a client over four weeks using Cursor. The result was a working app shipped on time. The experience working with Cursor on Dart was uneven, and the unevenness reveals something about how AI tools compare across languages.
The project
A field service tracking app:
- iOS and Android via Flutter 3.22
- Riverpod for state management
- Supabase for backend (auth + database + realtime)
- Local persistence via Drift
- About 8000 lines of Dart
Standard greenfield mobile app. No legacy code, modern stack, well-defined feature set.
Where Cursor fell short
The Dart-specific weaknesses I noticed:
Generated code awareness. Riverpod uses code generation extensively. Files like provider.g.dart are generated. Cursor sometimes tried to edit them directly, not understanding they’d be regenerated. Adding **/*.g.dart to .cursorignore helped.
Null safety patterns. Dart’s null safety is well-defined but Cursor’s defaults sometimes produced code that compiled in non-null-safe mode but failed in strict null-safe mode. Adding rules for null safety to .cursorrules helped on subsequent files.
Async patterns. Dart’s Future and Stream patterns differ from JavaScript’s Promise and async iteration. Cursor occasionally suggested patterns that were correct for JS but didn’t apply cleanly to Dart.
Widget construction. Flutter widgets are deeply nested. Cursor’s suggestions for widget trees were verbose and sometimes had incorrect parent-child relationships. The visual nature of UI work doesn’t transfer to text suggestions well.
Riverpod-specific patterns. Modern Riverpod (with code generation) has specific patterns. Cursor’s training data leans toward older Riverpod patterns. About 30% of Riverpod-related suggestions were structurally outdated.
Where Cursor still helped
Despite the weaknesses, Cursor was useful for:
Boilerplate JSON serialization. Dart’s json_serializable package generates fromJson/toJson, but you still write the model classes. Cursor handled this well.
Repository layer code. Database queries via Drift were fairly standard. Cursor produced reasonable repository implementations.
Testing. Dart’s test patterns are well-trained enough. Generated tests were workable.
Documentation. dartdoc comments came out reasonably.
The pattern: tasks where Dart’s specifics are minimal (data structures, repository code, tests) worked well. Tasks where Dart-specific idioms dominate (widgets, state management, async) needed more human input.
Productivity numbers
I estimated this project would take 6 weeks before starting. It took 4 weeks. So Cursor saved roughly 2 weeks.
By comparison, equivalent TypeScript work usually saves me 50-60% of estimated time when using Cursor. On Dart, it was closer to 33%. The gap is real.
The cost: Cursor Pro subscription ($20 × 1 month) plus some overage on usage. About $35 total. Still trivially worth it.
What I’d do differently
A few things I’d change next time:
Stronger .cursorrules from day one. I added rules incrementally as I noticed issues. A comprehensive rules file at the start (covering null safety, Riverpod patterns, widget conventions) would have improved suggestions.
More aggressive use of pinning. Pinning my reference Riverpod provider files at the start of each chat session helped when I remembered to do it. Making this routine would have helped more.
Different model selection for harder tasks. For tasks involving complex async or state management, switching to Claude 3.5 Sonnet (the strongest model) made a difference vs. Cursor’s default. Worth defaulting to Sonnet on Flutter work.
What this taught me about language coverage
The pattern I’d extend: AI tooling quality varies by language, and the variance is bigger than I’d expected.
My informal ranking based on personal use:
- Best: TypeScript, JavaScript, Python, Go
- Good: Java, C#, Ruby, PHP
- Decent: Rust, Swift, Kotlin
- Rough: Dart, Elixir, Clojure
- Bad: Niche languages without strong open-source presence
The “best” tier has years of training data, popular open-source examples, and strong community resources. The “rough” tier has training data but less polished, less consistent.
For projects in the rough tier, I’d recommend:
- More upfront configuration
- More careful review of suggestions
- More patience with the AI being wrong about idiomatic patterns
- Less expectation that it’ll feel like TypeScript work
For projects in the best tier, the AI tooling experience is generally close to the marketing.
The Flutter-specific learnings
For Flutter teams considering AI tooling adoption:
The autocomplete experience is the strongest part. Tab completion in Flutter is workable. The chat panel and Composer are weaker.
Hot reload + manual coding may be faster than AI for some work. Flutter’s hot reload makes “tweak the widget, see the result” fast. AI suggestions for visual tuning often go through more iterations than just tweaking.
Generate the model layer; write the UI by hand. Models, repositories, services — let AI generate. UI widgets and animations — write by hand. The split fits the strengths.
Use AI for tests aggressively. Test code is well-structured and Dart’s testing libraries are reasonably trained. Tests are a sweet spot.
Would I use Cursor on the next Flutter project?
Yes, but with calibrated expectations. The 33% productivity gain is real and meaningful. It’s just less than the 60% I’d get on a TypeScript project of the same size.
For teams choosing between Flutter (with AI tooling that’s decent) and React Native (with AI tooling that’s strong), this is a real consideration. The framework choice affects the development velocity through AI tooling fit, not just through framework characteristics.
Ten years from now, this gap should close. The training data for less-popular languages will catch up. For now, the gap is part of the cost of working in a less-popular ecosystem.
A note for the AI tool vendors
Improving the experience on languages outside the top tier would be a real differentiator. Cursor, Cline, and others all have similar quality across the top tier — the differentiation is mostly UX. On the rough tier, there’s real room for tooling that handles language-specific patterns better.
I’d particularly value:
- Better Dart null safety handling
- Better Riverpod (and similar codegen-heavy frameworks) understanding
- Better recognition of which generated files not to edit
These are achievable improvements. Whoever ships them gets a meaningful edge on Flutter, Hasura, GraphQL Codegen, and similar codegen-heavy stacks.