Choosing a Vector Database in 2024: A Practical Guide
Pinecone, Qdrant, Weaviate, Milvus, pgvector, and the newer entrants — a working engineer's comparison across latency, recall, cost, and operational complexity. How to pick without regret.
When teams ask me which vector database they should use for a new RAG project, my first question is: “Do you already run Postgres?” If the answer is yes and your corpus is under 50M vectors, we’re probably talking about pgvector.
This surprises people. “Postgres for vectors? Doesn’t that need a specialized database?” Once upon a time, yes. In 2025, pgvector is good enough to handle the workloads that 80% of production RAG systems actually have — and it’s enormously cheaper to operate than a second dedicated database.
This article covers what pgvector can and can’t do, how to tune it, and where the ceilings genuinely are.
pgvector is a Postgres extension that adds a vector column type and two index types: IVFFlat and HNSW. You install it with CREATE EXTENSION vector; and you have vectors.
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(1536),
metadata JSONB,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
That’s it. You can now do:
SELECT id, content, embedding <=> '[...]' AS distance
FROM documents
WHERE metadata->>'tenant_id' = '42'
ORDER BY embedding <=> '[...]'
LIMIT 10;
<=> is cosine distance, <-> is L2, <#> is negative inner product. HNSW handles the ANN search; Postgres handles the metadata filter.
Dedicated vector DBs have real advantages at scale, but they come with real costs:
Postgres gives you:
For workloads under ~50M vectors, pgvector’s performance is enough and the operational simplicity is a gift. You should graduate when performance becomes your bottleneck, not when you hit an arbitrary scale.
A rough benchmark (our internal harness, 1536-dim vectors, HNSW index, cosine distance, 16-core c7i.8xlarge, NVMe):
| Corpus size | P50 latency | P95 latency | QPS (single conn) | RAM |
|---|---|---|---|---|
| 1M vectors | 3ms | 8ms | ~800 | ~6 GB |
| 10M vectors | 5ms | 18ms | ~500 | ~55 GB |
| 50M vectors | 12ms | 40ms | ~280 | ~250 GB |
| 100M vectors | 28ms | 95ms | ~150 | ~500 GB |
At 1M vectors it’s blisteringly fast. At 10M it’s solid. At 50M it still works, but RAM requirements and single-query latency start to bite. At 100M, you’re feeling it.
Compared to dedicated options (Qdrant, Milvus) at the same scale, pgvector is 2–3x slower on pure ANN throughput at the top end. At 1–10M vectors, the gap narrows to 20–40% and is often dominated by network latency anyway.
pgvectorscale UpgradeIn 2024, Timescale open-sourced pgvectorscale, a companion extension that adds two things:
With pgvectorscale, the 100M-vector benchmark above drops to:
| Corpus size | P50 latency | P95 latency | QPS | RAM |
|---|---|---|---|---|
| 100M vectors (pgvectorscale SBQ) | 8ms | 22ms | ~450 | ~60 GB |
That’s a 4–5x improvement across the board. For teams serious about pgvector at scale, pgvectorscale is essentially required — it pushes the ceiling from ~50M to ~500M vectors.
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
m: number of connections per graph node. Higher = better recall, more memory. Default 16 is a sensible start; 24–32 for high-recall needs; 8 for memory-constrained.ef_construction: search width during index build. Higher = slower build, better recall. Default 64; bump to 200 for best quality.At query time:
SET hnsw.ef_search = 40; -- default 40; higher = better recall, slower
Typical production: m=16, ef_construction=200, ef_search=40–100. Tune ef_search per-query based on how much recall the application needs.
IVFFlat is older, simpler, faster to build, uses less memory. HNSW gives better recall at query time but takes much longer to build. For almost every production case, use HNSW. IVFFlat is a reasonable choice for very write-heavy workloads where index build time dominates.
maintenance_work_memHNSW index builds are memory-bound. Postgres default is 64MB which is laughable. Bump it to 8–16 GB for the index build session:
SET maintenance_work_mem = '16GB';
CREATE INDEX ...
From Postgres 16 + pgvector 0.6, HNSW builds are parallelizable:
SET max_parallel_maintenance_workers = 8;
On a 16-core machine, a 10M-vector HNSW build drops from ~2 hours to ~20 minutes.
work_memVector queries can sort large intermediate result sets. Bump work_mem to 256MB-1GB for queries that are slow.
A vector query with a metadata filter can’t use the HNSW index for the filter — it filters after retrieving candidates. If your filter is highly selective (only 1% of vectors match), this can decimate recall.
Fix 1: Over-retrieve. Query for top-200 candidates, then filter down to your top-10. Works for moderate selectivity.
Fix 2: Partitioned indexes. If you filter frequently on a tenant_id or category, partition the table by that field. Each partition has its own HNSW index; queries only hit the relevant partition.
Fix 3: Partial indexes. For a specific hot filter, create a partial index:
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WHERE status = 'published';
Queries with WHERE status = 'published' will use this index exclusively, ignoring draft rows entirely.
The most common production setup for B2B SaaS: many tenants, each with their own corpus, queries must be tenant-isolated.
Pattern A: Single table, tenant_id column. Simplest. Good up to ~5M vectors per tenant and modest tenant count. Over-retrieve and post-filter.
Pattern B: Partition by tenant. Postgres declarative partitioning, one partition per tenant or per tenant group. Best for 10s–100s of tenants with large per-tenant corpora.
Pattern C: Schema per tenant. Postgres schemas as tenants. Good isolation; ugly migrations. Use only if compliance requires hard isolation.
Pattern D: Database per large tenant. For your top 10–50 accounts on enterprise contracts. Lets you scale vertically and bill them separately.
Most teams end up with Pattern A for small tenants and Pattern D for whales.
Postgres logical or physical replication gets you read-scaling. HNSW indexes replicate fine. Point your read-heavy RAG queries at replicas and let the primary handle writes.
Partition by a dimension that correlates with query filters (tenant, date range, category). Each partition has its own HNSW index. Queries that filter on the partition key only scan relevant partitions.
Citus (now Azure-owned but still open source) shards Postgres across nodes. Vector queries with a partition key work well. Cross-shard vector queries are possible but require coordination.
pgBouncer in transaction-pooling mode. Long-running vector queries don’t play nicely with session-pooling. Budget ~5–10 connections per inference worker.
Concrete signals that pgvector is no longer enough:
If you cross those thresholds, migrate. Qdrant or Pinecone are the usual next stops. Expect 2–4 weeks of migration work: dual-write, backfill, cutover.
1. Don’t delete and re-insert; update. An UPDATE to a row with a vector column rewrites the vector. A DELETE + INSERT does the same thing plus fragmenting the index.
2. VACUUM matters more than usual. HNSW indexes accumulate cruft on updates. VACUUM ANALYZE weekly; consider REINDEX monthly on write-heavy tables.
3. Watch for bloat. pg_stat_user_indexes shows you if your index is getting huge. If the ratio of index size to table size is climbing, it’s time to REINDEX.
4. Materialize your embeddings async. Don’t block user-facing writes on embedding generation. Queue the embedding job; UPDATE the row when it completes.
5. Keep pgvector current. pgvector had major performance wins in 0.5.0, 0.6.0, and 0.7.0. Every upgrade is worth it.
For a 20M-vector corpus with 1,000 QPS production load:
At 20M vectors, pgvector isn’t always cheapest, but it’s competitive and you get a full Postgres for the same spend. For most teams, that changes the calculus.
Tuning a pgvector deployment or deciding whether to migrate? We can help — we’ve shipped pgvector workloads at every size.
Pinecone, Qdrant, Weaviate, Milvus, pgvector, and the newer entrants — a working engineer's comparison across latency, recall, cost, and operational complexity. How to pick without regret.
Agents need more than a vector database. A tour of the memory stack production agents actually use — working, short-term, long-term, semantic, episodic — and the infrastructure behind each.
Pure dense vector search leaves recall on the table. Pure keyword search misses semantics. Hybrid search with BM25 and dense vectors plus a reranker is the production pattern that actually works.