AI coding hallucinations: the four shapes they take and how to spot them
Published 2026-05-11 by Owner
The word “hallucination” is overloaded. It gets applied to everything from a wrong variable name to an invented library, and treating them as one thing produces bad instincts for catching them. They’re not one thing. There are four distinct shapes, each with a different detection profile and a different fix.
The reason this matters practically: if every hallucination were caught by the type checker, you’d need no extra process. If none were, you’d need to verify everything manually. The reality is somewhere in the middle, and where the line falls is specific to each type. Understanding the map changes how much time to spend on each category of review.
Shape 1: The invented API
The most common and the most embarrassing. The model generates a call to a function, method, flag, or config key that does not exist. The code looks plausible; the names follow the library’s conventions; the docs quote is often convincing. It doesn’t matter. The thing isn’t there.
A few examples of what this looks like in practice:
// Invented: Node's fs module has no .readFileAsync()
const content = await fs.readFileAsync('config.json', 'utf-8');
// Invented: React Query's useQuery does not take a `suspenseMode` option
const { data } = useQuery({ queryKey: ['user'], queryFn: fetchUser, suspenseMode: true });
// Invented: Zod has no .coerce.url() method
const schema = z.coerce.url();
The TypeScript compiler catches most of these when you have strict mode on and good type definitions installed. fs.readFileAsync fails at compile time if you’re importing from @types/node; suspenseMode fails if @tanstack/react-query’s types are current. The bundler catches the rest at import time — if you import a named export that doesn’t exist from a package, Rollup and Webpack both error.
What slips through: JavaScript projects without TypeScript, libraries with loose types (any-heavy SDKs), and runtime-resolved config keys that nothing validates at build time. For those, the only defense is running the code and reading the error.
One pattern that catches more invented APIs than TypeScript alone: keeping type definitions up to date. A model trained on React Query v4 might invent options that existed as open feature requests at the time — and if your installed @tanstack/react-query types are pinned to an old version, the invented option won’t trigger a type error because the old types have gaps. bun update your devDependencies and this class of slipthrough shrinks.
Shape 2: The plausible-wrong implementation
This one is harder. The code compiles and the tests pass — if you wrote shallow tests that only check the happy path. The logic is wrong in a way that only surfaces at the boundary: off-by-one, inverted condition, wrong operator precedence, misunderstood semantics.
// Looks like correct debounce. The `leading` option is inverted.
// Should fire immediately on the first call, then suppress.
// This version suppresses the first call and fires on trailing.
const save = debounce(handleSave, 300, { leading: false, trailing: true });
// Looks like safe HTML escaping. escapeHtml() only escapes &, <, >, ", '.
// Doesn't handle backtick injection in template literals, SVG onload, etc.
const safe = `<div title="${escapeHtml(userInput)}">...</div>`;
// Sorts numbers lexicographically, not numerically.
// [10, 9, 100].sort() → [10, 100, 9]
const sorted = items.sort();
The type checker does not help here. These are all type-correct. Tests help, but only if the test covers the broken case. The plausible-wrong pattern is why test coverage percentages are misleading — 100% coverage of the happy path misses all of these.
The signal to look for: when the model implements something that “should be obvious” without a comment explaining why it made a non-default choice. The leading: false example above is the kind of thing a model inserts without annotating why. A human making a considered choice would comment it. Silence is suspicious.
Another marker: the model frequently implements the most common case without checking whether the common case applies to your data. Array .sort() without a comparator is correct for arrays of strings. It’s wrong for numbers. If your variable name is ambiguous — items, results, values — the model may assume string context without asking. Non-default behavior is worth pausing on; default behavior applied to the wrong type is where this pattern usually hides.
Shape 3: The fabricated citation
The model produces a reference to something that doesn’t exist: a GitHub issue number, a PR that was supposedly merged, an RFC, a docs page, a changelog entry. The specificity makes it convincing — it says “see issue #4821” or “this was fixed in v3.2.1 per the migration guide” and neither the issue nor the migration guide exists.
This one the toolchain can’t catch at all. It’s not in the code; it’s in the comment, the commit message, or the chat explanation.
// Workaround for React DOM bug #22416 (fixed in React 19)
// See: https://github.com/facebook/react/issues/22416
If that issue doesn’t exist, the comment is misinformation that will mislead whoever reads the code next. And whoever reads it next is probably you, six months later.
The tell: when the model produces a URL or issue number, open it immediately. Do this before accepting the change. If the URL 404s or the linked issue describes a different problem, the citation is fabricated. Remove it rather than leaving a landmine.
A related variant: the model cites an API behavior that changed in a specific version (“as of v2.4, strict defaults to true”) without that being true. Check the actual changelog before accepting that claim.
Shape 4: The outdated practice
The model’s training data has a cutoff and a distribution skew. Code patterns from 2021-2023 are heavily represented. Patterns that were deprecated, superseded, or flagged as problematic since then appear with equal confidence as current best practices.
// React 16 lifecycle — fine then, avoid now
componentWillMount() {
this.fetchData();
}
// Webpack 4 config syntax. Webpack 5 changed several defaults.
module.exports = {
mode: 'development',
optimization: { splitChunks: { chunks: 'async' } },
};
// bcrypt with rounds=10 was standard in 2019.
// 12 is the current floor for new production systems.
const hash = await bcrypt.hash(password, 10);
None of these are wrong in the sense of crashing. They’re wrong in the sense of being behind the current state of practice, sometimes with meaningful security or performance consequences.
The toolchain helps here when libraries ship deprecation warnings or ESLint rules. React’s componentWillMount triggers a deprecation warning in React 16.3+ and was removed in React 18 — the build will warn or fail. But bcrypt rounds won’t warn. Webpack config silently accepts outdated keys in some versions.
The human check: when the model produces a pattern that looks like it’s from a tutorial, search for the current documentation for that API. “How to do X” results from 2020 may be the model’s source. Compare the current docs to what was generated.
A useful trigger: any time the model writes something involving security configuration — hashing rounds, TLS settings, CORS policy, rate limits — treat it as potentially outdated by default. Security recommendations shift faster than the average training distribution. The model’s answer may have been correct when written; it may not be correct now. Don’t skip verification on security-sensitive defaults.
The linter and type checker as hallucination detectors
Combining what the toolchain does and doesn’t catch:
| Hallucination type | TypeScript catches | ESLint catches | Bundler catches | Tests catch | Human required |
|---|---|---|---|---|---|
| Invented API | Yes (with types) | Partial | Yes (named imports) | Sometimes | If no types |
| Plausible-wrong | No | Partial | No | If test covers it | Often |
| Fabricated citation | No | No | No | No | Always |
| Outdated practice | Partial (deprecations) | Yes (with plugins) | No | No | Often |
The practical consequence: TypeScript strict mode is the single highest-leverage automated defense. Install @typescript-eslint/strict, enable noImplicitAny, and add library-specific ESLint plugins (e.g., eslint-plugin-react-hooks for hooks rules). These collectively catch the most common invented-API and outdated-practice cases.
For plausible-wrong, the threshold test is: write at least one test that exercises the boundary case of any logic the model generated. For sort order, test with numbers that sort differently lexicographically versus numerically. For debounce, test the leading-edge behavior explicitly. If the model generated it, assume the boundary case is exactly where the mistake lives.
Reading the bluff
Models have tells when they’re producing something they’re less certain about.
Confidence without specifics. “This is the standard approach for…” or “The recommended pattern is…” without a doc link or version citation is the model signaling that it’s reconstructing from pattern rather than from ground truth. Genuine recommendations from documentation come with a source.
Plausible-but-vague justifications. “This is more performant” or “This avoids a known issue” without saying what issue or how much performance improvement is hand-waving. Ask: which issue? What is the actual number? If the answer produces a fabricated issue number, that’s the tell.
Fabricated specifics that still don’t pin down. The model invents specificity to appear authoritative — “fixed in v3.2.1” sounds precise, but if the changelog for v3.2.1 doesn’t mention the thing it claims was fixed, the precision was cosmetic. Specifics are only a signal of reliability if they check out.
Unusual confidence on a question you’d expect uncertainty on. If you ask about a niche library with sparse training data and the model answers fluently without hedging, that’s backwards. Fluency in a domain the model can’t know well is a signal to verify, not a signal to trust.
When the hallucination is caught
The right response is not to note the hallucination and move on; it’s to ground the model in the correct state before continuing.
For an invented API, paste the relevant section of the actual docs or the type signature from the installed package into the conversation:
The actual signature from @types/node is:
fs.readFile(path, options, callback)
fs.promises.readFile(path, options): Promise<Buffer | string>
There is no readFileAsync. Rewrite using fs.promises.readFile.
For outdated practice, paste the current docs:
The bcrypt docs now recommend a minimum of 12 rounds for new systems.
Rewrite with saltOrRounds: 12.
For fabricated citations, delete the comment entirely rather than replacing it with a corrected reference. If the original rationale was fabricated, there may not be a real rationale to replace it with — in which case “no comment” is more honest than a wrong one.
The grounding step matters. A model that produced a hallucinated answer and received only an error message will sometimes regenerate a different hallucination rather than correct its approach. A model that receives the actual, correct reference material converges faster and produces fewer follow-up errors. The error alone leaves the model’s internal state unchanged; the correct material changes what it can generate.