Tinker AI
Read reviews

Outcome

Service shipped on time but Cline's contribution was smaller than expected; Rust's borrow checker made many agent loops thrash

8 min read

I built a Rust web service over the past six weeks using Cline as my main AI assistant. The service: a billing usage aggregator that ingests events, runs aggregations, and exposes a query API. About 8000 lines of Rust, mostly Axum + sqlx + tokio.

I expected Cline to be ~50% of the typing. The actual contribution was closer to 20%. Rust + agent loops have a friction I hadn’t fully appreciated until I worked through it.

Why Rust is harder for agents

Rust’s borrow checker is the obvious challenge. The model produces code that’s syntactically valid but doesn’t satisfy the borrow checker. The agent’s loop sees the borrow check failure, tries to fix it, sometimes makes it worse, sometimes “fixes” it by adding clones that don’t make semantic sense.

Less obvious challenges:

Trait bounds. The model writes generic functions with trait bounds that almost work. Some specific case fails because a bound isn’t satisfied (e.g., Send + Sync issues for async). The agent’s iteration to fix can introduce more bounds, more constraints, until the function signature is unrecognizable.

Error handling. Rust’s Result chains are explicit. The model produces code that uses ? operator everywhere; some of those don’t fit because the error types don’t convert. Adding From impls or using anyhow are the right answers; the agent often picks the wrong one.

Async lifetimes. Async traits, Pin<Box<dyn Future>>, async functions in traits — the model knows the patterns but applies them inconsistently. About 40% of async-heavy code I had Cline produce had lifetime issues that took several iterations.

Cargo features. Conditional compilation under feature flags. The model often writes code that only compiles under certain feature combinations and fails under others. CI catches it; agent loops don’t.

The first week: figuring out what Cline could and couldn’t do

Week 1 was a lot of “Cline produces code, doesn’t compile, Cline tries to fix, doesn’t compile, Cline gets worse.” After several frustrated sessions, I built a mental model.

Things Cline did well:

  • Scaffolding new files (handler files, repository files, config structs)
  • Writing serde structs from JSON examples
  • Writing SQL queries (sqlx queries with the query! macro)
  • Writing tests for already-working code

Things Cline did poorly:

  • Generating new logic from scratch when the logic involved generics or async
  • Refactoring existing code with non-trivial lifetimes
  • Anything involving custom traits with associated types
  • Macros (declarative or procedural)

After week 1, I shifted to using Cline only for the things it did well. Cline’s contribution dropped from “everything” to “specific tasks I’d identified work for it.”

Week 2-3: scaffolding and SQL

These two weeks were the most productive with Cline. The pattern:

  1. Design a feature on paper
  2. Define the structs (request, response, internal types)
  3. Ask Cline to scaffold the handler, repository, and tests using the structs
  4. Fill in the actual logic myself

Cline produced ~70% of the file structure and ~30% of the actual code. The remaining 70% of code (the actual business logic) was mine.

For SQL queries with sqlx’s compile-time checking, Cline was particularly useful. I’d describe the query in plain English, Cline would write the query! macro invocation, sqlx would verify against the database schema at compile time. When sqlx complained, the error was specific enough that Cline could fix it on the next iteration.

A specific example: a query that aggregates events by hour, with timezone-aware bucketing. I described the query; Cline wrote it; sqlx complained about a type mismatch in the timestamp column; Cline fixed it; the query compiled. Total time: ~5 minutes vs probably 15 minutes manually.

Week 4: the lifetime week

Week 4 was when I tried to use Cline for a refactor that touched lifetimes. Specifically: extracting a request handler’s logic into a trait so I could test it against a mock.

The trait had a method that returned a future. The future borrowed self. The handler held a database pool that needed to outlive the future. Several lifetime parameters were involved.

Cline produced an implementation. It didn’t compile. Cline iterated. After 8 iterations and ~$3 in API costs, the code still didn’t compile. The lifetimes were tangled in ways the model couldn’t reason about.

I gave up on Cline and did the refactor myself. Took about 90 minutes. The end result was simpler than what Cline had been producing — I’d realized halfway through that I didn’t actually need the trait abstraction. The mock could use a different concrete type with the same shape, no lifetime gymnastics.

The lesson: when Cline can’t make progress on a Rust task in 2-3 iterations, it’s usually not going to. The right move is to stop the agent and reconsider the design.

Week 5: the testing week

Week 5 was largely testing. Cline was strong here. Patterns:

Property-based tests with proptest. I’d describe the invariants; Cline wrote the proptest setup. Strategies were sometimes off but the overall structure was correct.

Integration tests with testcontainers. Cline knew the testcontainers pattern. Setting up Postgres for tests, running migrations, cleaning up after — Cline handled this well.

Mocking external services. I’d describe the mock behavior; Cline wrote a wiremock setup. Worked first time most of the time.

About 60% of the testing code was Cline’s. This was the highest ratio across the whole project.

Week 6: the final mile

The last week was wiring everything up — Docker, CI, deployment config, observability. Cline’s contribution was uneven:

  • Docker: medium. Cline produced a working Dockerfile but missed the cargo-chef caching pattern that I wanted; I added it manually.
  • CI: medium. The GitHub Actions YAML was correct but verbose; I trimmed it.
  • Deployment: low. The fly.toml deployment file required project-specific config that Cline didn’t know.
  • Observability: low. The tracing configuration depended on internal team conventions Cline couldn’t infer.

The pattern: Cline was good at “common Rust patterns” but weaker at “this specific project’s patterns.” The latter required documentation Cline could read or in-the-loop guidance from me.

What I’d configure differently

In retrospect, three things I’d change:

Better .clinerules from the start. I added rules incrementally. A more comprehensive .clinerules from day one — including project-specific patterns for error handling, async, and trait design — would have improved Cline’s defaults.

A stronger model for Rust. I used Claude 3.5 Sonnet throughout. For the lifetime-heavy work, GPT-4o or o1 might have produced better results. I didn’t test rigorously; this is a hypothesis.

Less time letting Cline iterate on hard problems. I let Cline burn 2-3 cycles on lifetime issues before giving up. Should have given up after one cycle and spent the time thinking myself.

What I’d recommend to others

For new Rust projects, my recommendation based on this experience:

Use Cline for scaffolding, SQL, and tests. These are the areas where Cline added clear value. About 30-40% of a typical Rust project is in these categories; getting AI help on them is worth it.

Don’t expect Cline to do net-new logic with generics or async. The borrow checker will fight back. Manual implementation is faster than letting the agent thrash.

Always have CI catching feature-flag combinations. Cline produces code that compiles under your active features but might not under others. CI is the safety net.

Write the structs before asking for code. Rust’s type system rewards careful upfront type design. When you give Cline well-defined types to work against, it produces better code than when it’s inventing types as it goes.

The honest cost analysis

Total Cline API costs across six weeks: ~$95.

Cline’s contribution to lines of code: roughly 20% (1600 lines out of 8000).

Without Cline, my estimate of project duration: 8 weeks instead of 6.

So Cline saved me about 2 weeks. At my freelance rate, that’s worth about $5000. The $95 API cost is trivial compared to that.

But the productivity story is uneven. Weeks 2-3 (scaffolding + SQL) saw 50%+ Cline contribution. Week 4 (lifetimes) saw negative Cline contribution. Average across the project hides the variance.

Would I use Cline on Rust again?

Yes, with adjusted expectations. Rust + Cline isn’t the productivity miracle the marketing suggests. It’s a useful tool for specific tasks within a Rust project. Knowing which tasks fit and which don’t is the difference between a productive collaboration and a frustrating one.

For comparison: I’ve used Cline on TypeScript projects of similar scope. The Cline contribution there is closer to 50%. Rust’s harder-to-reason-about type system genuinely changes what an agent can do well.

Other Rust developers I’ve talked to report similar experiences. The Rust + AI story is real but bounded. Don’t expect 50% productivity gains. Expect 20-30% on the parts that fit, 0% on the parts that don’t.