Cursor's codebase indexing on a 1M-line repo: tuning .cursorignore and what to expect

Published 2026-04-18 by Owner

I’ve been working in a 1.1M-line monorepo for the past six months. Cursor is the only AI tool I’ve found that handles it without falling apart, but the default behavior is rough. With three settings tuned, the experience becomes good. Here’s what I changed.

Default behavior on a large repo

Open Cursor in a fresh clone of a 1M-line repo and several things happen:

Indexing starts immediately and takes 8-15 minutes
CPU is pinned at high usage during indexing
The chat panel’s “codebase” results return mediocre matches early on
Memory usage climbs to 4-6GB
After indexing completes, search is fast but suggestions still struggle with very-old or very-rare code

The third point is the biggest practical issue. When I ask “where is the user authentication logic,” Cursor’s codebase search returns 8 results and the relevant one is in position 5 or 6. The model uses what’s high in the ranking, which means it grabs the wrong context.

The .cursorignore file

Cursor reads .cursorignore like git reads .gitignore. Files matching the patterns are excluded from indexing.

For a 1M-line repo, this is the file that matters most. Here’s what mine looks like:

# Generated code
**/*.generated.ts
**/*.generated.go
**/generated/

# Vendored code
vendor/
third_party/
node_modules/

# Build artifacts (already in .gitignore but worth being explicit)
**/dist/
**/build/
**/.next/
**/.turbo/
**/target/
**/*.min.js
**/*.bundle.js

# Tests for things we'd never ask about
**/__snapshots__/
**/coverage/
**/cypress/videos/
**/cypress/screenshots/

# Migrations history (we don't ask Cursor to write migrations)
db/migrations/
db/seeds/

# Documentation we're not editing
docs/api-reference/
CHANGELOG.md

# Large data files
data/
fixtures/large/

After applying this, my indexing time dropped from 12 minutes to 4 minutes. Memory dropped from 5.5GB to 2.8GB. Search results got noticeably better because the model was no longer competing with snapshot tests and generated code for top ranks.

The pattern: exclude things you never want the model to use as a reference. Generated code is the worst offender — the model picks up generated patterns and applies them to hand-written code.

What NOT to .cursorignore

Several things I tried excluding and reverted:

Test files. I tried excluding **/*.test.ts and the suggestions immediately got worse. Tests encode usage patterns. The model uses them to understand how functions are called. Without tests in the index, the model imagines usage patterns instead of seeing them.

Old code. I tried excluding everything in legacy/ (a directory we don’t actively edit). The model started suggesting modern patterns for changes that needed to integrate with legacy code. Now legacy is in the index but not in the prioritized files (more on that below).

Type definition files. Excluding **/*.d.ts removed crucial type information. The model started writing code with wrong types because it couldn’t see the declarations.

The principle: exclude noise (generated, vendored, snapshots), not signal (tests, types, old-but-still-relevant code).

Prioritizing files

Cursor lets you prioritize files via the chat panel — when you @-mention a file, it gets weighted higher in the next response. For repeated work in a specific area, you can pin files to the chat session.

For a 1M-line repo, pinning is essential. Without pins, every chat message starts with the model searching the index for relevant files, and the search is approximate. With pins, the model has the right files in context immediately.

I pin:

The 3-5 files I’m actively editing
The type definitions for the domain I’m working in
One representative test file (so the model knows the test patterns)
Any architecture docs relevant to the area (e.g., docs/architecture/billing-flows.md)

The total context cost is 15-25k tokens of pinned content. For Claude 3.5 Sonnet with a 200k window, that’s fine.

When indexing breaks

A few failure modes I’ve hit:

The index gets out of sync after a large rebase. Solution: Cursor → Settings → Codebase → Re-index. Takes a while but cleans up the staleness.

Indexing fails silently on files with mixed encodings. The repo I work in had a few files with Windows-1252 encoding mixed in. Cursor would skip them without warning, leaving holes in the index. Solution: convert to UTF-8 with iconv and re-index.

Search returns no results for known files. Usually means a .cursorignore rule is overly broad. I once accidentally excluded all of src/ because a glob was wrong. Solution: check the ignore patterns and re-index.

Multi-root workspaces

Cursor supports VS Code’s multi-root workspace concept, but the codebase indexing treats them as separate projects. For a “monorepo opened with multiple roots” setup, you get N parallel indexes that don’t share context.

This is sometimes what you want (frontend and backend, no cross-talk). For a single conceptual codebase, it’s wrong — the model can’t see relationships across roots.

Solution: open the monorepo from the root, not via multi-root. Use the per-directory rules I described in another guide for stack-specific behavior.

The benchmark I ran

To see if the tuning actually helped, I ran the same task across the same repo with three configurations:

Default Cursor (no .cursorignore, no pinning):

Task: “find the function that handles webhook signature verification and refactor it to support multiple signing keys”
Time to find the right function: 90 seconds (the model searched, returned the wrong file, I corrected, it searched again)
First-attempt edit correctness: ~50% (it referenced an old import path)
Total cost in tokens: 120k

Tuned .cursorignore, no pinning:

Time to find the right function: 60 seconds (better search results)
First-attempt correctness: ~70%
Total cost: 95k

Tuned .cursorignore + pinned the relevant files:

Time to find the right function: instant (pinned)
First-attempt correctness: ~95%
Total cost: 70k

The pinning step accounts for most of the improvement. .cursorignore tuning helps too, but pinning is the bigger lever for large repos.

What I’d ask Cursor to add

Auto-pinning based on recently-edited files would close most of the gap. The user shouldn’t have to remember to pin every relevant file at the start of a chat session. Cursor knows which files I’ve been editing; it could suggest pins or auto-pin them.

Until that ships, the workflow is: open Cursor, pin the files you’ll work on, then start chatting. The 30 seconds of pinning at the start saves multiple minutes of wrong-file searching during the session.