AI coding and strong type systems: where TypeScript, Rust, and Haskell help
Published 2026-05-11 by Owner
There’s a thing that happens when you use an AI coding assistant in a well-typed codebase that doesn’t happen in JavaScript or Python: the model writes something plausible-looking, the type checker immediately tells you it’s wrong, you paste the error back, and the model fixes it. The feedback loop closes in seconds. In JavaScript or Python, the plausible-looking code makes it to production.
This isn’t a minor ergonomic difference. It changes how reliable AI-assisted coding is on a given project.
The mechanism is simple: language models generate statistically likely code. Statistically likely is correlated with correct but not identical to it. The type system is a deterministic filter that accepts only a subset of statistically likely code—the subset that is also type-correct. Every constraint the type system adds shrinks the space of AI-generated code that can silently fail. More constraints, fewer silent failures.
This is a claim about the structure of the problem, not about any specific model or tool. It applies equally to Copilot, Cursor, Claude, and any future model. Better models hallucinate less, but they still hallucinate. The type system’s value isn’t that it compensates for model weakness; it’s that it provides a check that’s orthogonal to the model’s capabilities—deterministic where the model is probabilistic, exact where the model approximates.
Why the type checker beats a unit test for AI feedback
A unit test is a batch process. You write the test, you run the suite, you wait. For a project of any size that’s 10-120 seconds per cycle. The AI writes code; you kick off the tests; you come back; something in the middle failed; you debug which assumption was wrong. Two or three cycles to converge on working code.
A type checker is a continuous process. It fires on every save—or in the case of something like tsc --watch, faster than you can read the output. The AI writes code; the type error appears inline before your hand leaves the keyboard; you paste it back; the model self-corrects immediately.
More importantly, type errors catch a category of bugs that tests usually don’t. A unit test exercises one code path with one set of inputs. A type error is a proof about all code paths, all callers, all inputs. When an AI passes the wrong shape of object to a function, a test might only catch it if you wrote a test for exactly that call. A type system catches it unconditionally.
That’s the core reason strongly-typed codebases produce better AI outcomes: the feedback is faster, broader, and available on every change without any additional test infrastructure.
There’s also a compounding effect. Each AI-generated function that type-checks correctly becomes a typed API surface that the next AI-generated function can call and be checked against. The type system grows as the codebase grows, and the feedback quality improves with it. In a dynamically-typed codebase, each AI-generated piece is as unchecked as the first.
”AI lies but the compiler doesn’t”
Language models generate plausible-looking code. Plausible-looking is not the same as correct. The model has been trained on vast amounts of code and has a strong intuition for what code should look like, which means it will confidently produce code that fits the syntactic and stylistic patterns of its training data while still being semantically wrong.
Consider a concrete example. You ask an AI to write a function that processes a user record:
// What the AI returned
function getDisplayName(user: { name: string; nickname?: string }): string {
return user.nickname ?? user.name;
}
// What you actually pass elsewhere in the codebase
type User = {
displayName: string;
handle: string | null;
};
Without TypeScript, this looks reasonable and will run without errors until you hit a user whose nickname field happens to be undefined at runtime—except that field doesn’t exist at all in your actual User type. The model invented a plausible field name.
With TypeScript strict, getDisplayName(currentUser) fails immediately: Argument of type 'User' is not assignable to parameter of type '{ name: string; nickname?: string }'. The lie is exposed before the function is ever called.
This is the structural advantage. The model can’t know your codebase’s exact type signatures from the prompt alone; it approximates them. The type checker knows them exactly. Every place where the model’s approximation diverges from reality, the type checker catches it. In JavaScript, that divergence ships.
The pattern repeats across every kind of AI mistake: wrong generic parameter, incorrect union case, missing required property, field name that almost-matches. The model’s statistical approximation of your codebase is good but not perfect. The type system is an exact representation. The gap between them is where bugs live—and in a strict TypeScript or Rust codebase, that gap is surfaced immediately as a compile error rather than a support ticket at midnight.
Strict mode as a force multiplier
TypeScript’s strict mode is not a pedantry setting. It’s a set of checks that the language designers considered safe to turn off by default but which meaningfully expand what the type system can catch. With strict: false (or the individual flags that compose it), the model can produce code with implicit any everywhere and TypeScript will accept it. With strict: true, the model has to produce code that actually types.
The individual flags that matter most for AI output:
strictNullChecks: catches the AI’s single most common mistake—treating a potentially-null value as if it’s always present. The model will writeuser.profile.bio.substring(0, 100)and not consider thatprofileorbiomight benull. Strict null checks make every such case a compile error.noImplicitAny: prevents the model from silently widening types toanywhen it doesn’t know the right type. The model has to reason about the type or explicitly ask you.strictFunctionTypes: catches subtle contravariance bugs in callback signatures that the model gets wrong regularly.
Rust applies similar pressure, more aggressively. The compiler won’t let you ignore a Result or Option—you have to handle both cases or explicitly acknowledge you’re not. “No unwrap in production” is a discipline enforced by code review in many shops; in Rust, it’s enforced by the compiler the moment a reviewer runs cargo clippy. When an AI generates a Rust function that returns Result<T, E> and the caller doesn’t handle the error case, it fails to build.
Here’s what that looks like in practice. An AI writes a file-reading helper:
fn read_config(path: &str) -> Config {
let contents = std::fs::read_to_string(path).unwrap();
serde_json::from_str(&contents).unwrap()
}
This compiles, but clippy with #[deny(clippy::unwrap_used)] rejects it. The model’s shortcut is visible immediately. The correct version forces the caller to handle failure:
fn read_config(path: &str) -> Result<Config, Box<dyn std::error::Error>> {
let contents = std::fs::read_to_string(path)?;
let config = serde_json::from_str(&contents)?;
Ok(config)
}
Now the AI’s output is only acceptable if it correctly propagates errors. The type signature makes the contract explicit and the compiler enforces it.
Haskell goes furthest: there’s no escape hatch at all by default. The type system is total—it accounts for every case. AI-generated Haskell that doesn’t handle all branches of a pattern match, that uses a partial function where a total one exists, or that ignores a monadic effect fails to compile. The model’s output is either well-typed or rejected.
The gradient is real: the stronger the type system, the smaller the surface area where AI mistakes can hide. TypeScript strict mode captures maybe 60% of AI type mistakes; Rust captures more; Haskell captures nearly all of them. The ceiling isn’t perfection, but the improvement from baseline JavaScript is substantial at every step.
The any trap and where it costs you
TypeScript’s any type is the formal “turn off type checking here” escape hatch. It exists for good reasons: FFI boundaries, third-party libraries without type definitions, migration from JavaScript. But any is also the place where AI mistakes slip through.
A model that doesn’t know the right type will reach for any:
// The model produces this when it doesn't know what data looks like
function processResponse(data: any) {
return data.results.map((r: any) => r.name);
}
This compiles. It types. The type checker gives it a clean bill of health. But if data doesn’t have a results property—if the API contract changed, if you’re in a test environment, if you called the wrong endpoint—it fails at runtime with a property access on undefined.
The same function with an explicit type catches the mismatch before it runs:
type ApiResponse = {
results: { id: string; name: string }[];
total: number;
};
function processResponse(data: ApiResponse) {
return data.results.map((r) => r.name);
}
Now if the model generates a call site that passes the wrong shape, or if the API response schema changes and the type definition is updated, the error appears at the call site immediately.
The earned insight here: the easiest fix for AI-generated code with any is to add the correct type before prompting again. Give the model the type; the model will use it. any expands because the model doesn’t know; give it the shape and it fills in correctly.
A corollary: if a model reaches for any in a strict codebase, that’s a useful signal. It means the model doesn’t have enough context about the data shape. The right response is to add context—paste the relevant type definition, the API response structure, the database schema—not to accept the any and move on. Accepting it trades a compile-time error for a runtime surprise.
Where AI feels weakest
The flip side of everything above is the languages where the model produces the most bugs that reach production: JavaScript, Bash, and Python codebases without type annotations.
In JavaScript, there’s no type feedback at all. The model writes code that looks correct; the code runs; some edge case produces a runtime error. The edit-feedback loop becomes: write code, manually test it, discover a bug, return to the model. Each cycle is slower, and the bugs that slip through are exactly the bugs the model’s training distribution makes it most likely to produce—wrong property names, incorrect assumptions about optional fields, missing null checks.
Python without annotations is in the same position. The language supports type hints and the mypy or pyright checker provides real static analysis, but an unannotated Python codebase has no more type feedback than JavaScript. The model can return the wrong type from a function, pass the wrong argument order, or call a method that doesn’t exist on the actual runtime type—and none of it is caught before the code runs.
Bash has no type system at all, and the failure modes are uniquely sharp: word splitting, globbing, unset variable references, incorrect quoting. The model’s Bash is often almost-correct in a way that’s very hard to spot by reading. It runs fine on the happy path and fails in ways that are hard to trace when the input contains spaces, empty values, or special characters. I treat AI-generated Bash as requiring manual review every time, regardless of how confident the output looks.
SQL is another case worth noting. Most SQL codebases have no static type analysis sitting between the model and the database. The model can generate a query that joins on the wrong column, references a column that doesn’t exist in the target table, or uses a function that doesn’t exist in the specific SQL dialect. None of this surfaces until runtime. ORMs with strong TypeScript integration—Prisma, Drizzle—partially close this gap: they generate types from the schema and the TypeScript compiler catches mismatches at query-build time. The closer the gap between your runtime data representation and your static types, the more useful the type system becomes as an AI feedback mechanism.
There’s a specific failure pattern in older Python codebases: the model generates a function that calls .get() on a dictionary and assumes the result is always present, or it chains attribute accesses on an object without checking whether any intermediate attribute is None. These bugs are invisible until the exact input that triggers the missing-key case arrives. With mypy and full annotations, many of them are caught statically. Without annotations, they wait for production.
The irony is that JavaScript and Python feel more AI-friendly because they have less friction—the model produces code that runs immediately without a build step. But “runs immediately” is not the same as “runs correctly.” The friction of a type check is load-bearing; it’s doing work that otherwise falls on you.
For teams migrating a JavaScript project to TypeScript, or adding mypy to a Python codebase, the practical argument is: the type system pays back faster when you’re using AI assistance. The number of AI-generated bugs that type checking would catch is directly proportional to how many AI-assisted changes you’re making. More AI use makes stricter typing more valuable, not less.
The counterargument—that setting up a type system slows down early development—misses the direction of the current moment. Most mature codebases already have either TypeScript or Python. The question isn’t whether to adopt a type system from scratch; it’s whether to enable strict mode, add annotations to existing functions, or configure a stricter linting profile. That work is incremental, and the payback on AI correctness starts on the first typed function.
The strongest AI feedback loop looks like this: write a type signature, pass it to the model, accept code that type-checks, reject code that doesn’t. The type system and the model become a pair: the model generates candidates; the type system filters them. In a dynamically-typed language, that filter doesn’t exist, and you have to do the filtering yourself.
That manual filtering is the hidden cost of AI-assisted coding without types. It shows up as the “review the AI output carefully” step that everyone knows is necessary but few measure. In a typed codebase, part of that review is automated—and the automated part is faster and more reliable than human scrutiny on a long PR.