Cursor for vector database work: pgvector, Qdrant, and the patterns that work

Published 2025-12-11 by Owner

Vector databases are increasingly part of mainstream applications. Cursor handles the typical SDK code well; the embeddings strategy and quality tuning are still human work.

The tools

The vector databases I’ve used recently with Cursor:

pgvector: Postgres extension. Easy to integrate; familiar SQL-ish API.
Qdrant: Standalone vector DB. Strong API; better at scale than pgvector.
Pinecone: Managed service. Easy to start; harder to control.
Weaviate: Multi-modal capabilities. More complex setup.

For most projects, pgvector is the right starting point. Cursor handles its SQL well.

What Cursor handles well

Standard SDK calls. qdrant_client.upsert(points=[...]), similar patterns. The SDK methods are well-trained.

Embedding generation. Calling OpenAI’s embeddings API or similar. Standard pattern; Cursor handles.

Search queries. Basic similarity search, hybrid search. Cursor produces working code.

Migrations. Adding vector columns to Postgres, creating Qdrant collections. Standard work.

Chunking strategies. Splitting documents into chunks for embedding. Cursor knows the common patterns (fixed-size, semantic, recursive).

What needs human input

Embedding model choice. Which embedding model? OpenAI’s text-embedding-3, Cohere, sentence-transformers, etc. The right choice depends on your data and use case.

Chunk size tuning. Cursor’s default chunk size (often 512 tokens) may not fit your data. Tune based on actual results.

Distance metric. Cosine, dot product, Euclidean. For most cases, cosine. But “most cases” isn’t all cases.

Quality evaluation. Is the search returning useful results? Cursor can’t tell; you have to evaluate manually.

Index parameters. HNSW vs IVF, ef/M parameters, etc. These are tuning decisions.

For these, human judgment dominates. Cursor implements what you decide.

A specific pattern

For a documentation search project, the workflow:

I decided to use pgvector + OpenAI text-embedding-3-small
I designed the schema (documents table, chunks table, vectors)
I asked Cursor to scaffold the embedding pipeline:

> Generate a script that reads markdown files from docs/, chunks them
> by section, generates OpenAI embeddings, and inserts to the chunks
> table with the vector column. Use the existing OpenAI client setup.

Cursor produced a working script. I tested with a small subset; refined the chunking; ran on the full corpus.

I asked Cursor to scaffold the search endpoint:

> Generate a search endpoint that takes a query string, embeds it via
> OpenAI, finds the top 10 nearest chunks via pgvector, and returns
> them with their parent document references.

Cursor produced this too. The query was straightforward.

I evaluated the search quality manually. Made adjustments to chunking and prompt construction.

The pattern: AI handles the implementation; human evaluates the quality.

Common failure modes

A few specific issues I’ve seen Cursor stumble on:

Wrong distance metric. Cursor sometimes defaults to L2 distance when cosine is better for sentence embeddings. Catch in review.

Wrong vector dimension. When switching embedding models, the dimension changes. Cursor’s first attempt at the schema sometimes uses an old dimension.

N+1 queries on metadata. When fetching vectors plus their metadata, Cursor’s first attempts sometimes do a query per result instead of joining.

Index creation timing. Creating an HNSW index on a table with millions of rows takes hours. Cursor’s setup scripts sometimes try to create the index before bulk inserting; that’s wrong.

For each, review catches the issue. The fix is usually simple.

Worth the AI investment?

For vector database work, AI tools are genuinely productive. The SDK code is mechanical; AI handles efficiently.

The harder parts (embedding strategy, quality evaluation) are still human. AI doesn’t replace this; it accelerates the implementation around it.

For projects with vector search, expect a 30-50% productivity gain on the implementation portion. Don’t expect AI to figure out your embeddings strategy.

For projects starting with vector search:

Pick pgvector first. Easier to start. Familiar tooling. Switch to specialized DBs later if needed.

Use a strong embedding model. OpenAI text-embedding-3 or Cohere is fine. Don’t overthink this initially.

Build a small eval harness. A few queries with expected results. Run after every change. AI tools help build the harness.

Evaluate quality before optimizing. A working basic implementation is better than a clever broken one.

Tune incrementally. Start with defaults; tune based on observed quality.

The combination of “good defaults + AI-assisted implementation + human evaluation” works well. The category is mainstream enough that AI tools have solid coverage.

Closing

Vector database work in 2026 is more mainstream than it was even a year ago. AI tools handle the implementation efficiently; the strategy work is still human.

For projects considering vector search, the friction is lower than it was. AI tools make the implementation cheap; the cost of trying it is bounded. Worth attempting earlier rather than later.