Infrastructure

Choosing a Vector Database in 2024: A Practical Guide

Balys Kriksciunas 7 min read
#ai#infrastructure#vector-database#pinecone#qdrant#weaviate#milvus#pgvector#rag

Choosing a Vector Database in 2024: A Practical Guide

A vector database is the second-most-important piece of a RAG stack after the LLM itself. Get it right and retrieval is a boring, reliable part of your system. Get it wrong and you spend months debugging mysterious recall issues, runaway costs, or index rebuilds that take the system down at 3 AM.

This guide is a practical comparison of the vector database options teams are actually deploying in 2024. I’ll tell you what each is good at, what it’s bad at, and which ones I pick for which workloads.


What a Vector Database Actually Does

Strip away the marketing and a vector DB is two things:

  1. An approximate nearest neighbor (ANN) index — usually HNSW or IVF variants — that finds the top-K most similar vectors to a query in sub-linear time.
  2. Metadata storage with filtering — you store an ID, a vector, and a JSON blob of metadata; you can filter queries by metadata fields.

Everything else (hybrid search, reranking, multi-tenancy, replication) is a convenience that you could in principle build yourself. The question is which product does the boring parts well enough that you don’t have to.


The Short List

Five options cover 95% of production deployments:

A few others worth knowing:

I’ll focus on the short list.


Pinecone

When it wins: You want a managed service, you don’t want to run Kubernetes, and you’re willing to pay for uptime. Pinecone’s serverless tier (launched 2024) is genuinely good — pay-per-request-ish pricing, no capacity planning.

What’s good: Operationally painless. Auto-scaling, durable, low-latency. Hybrid search (sparse + dense) is built in. SDKs across major languages. Namespace isolation for multi-tenant apps is solid.

What’s not: Expensive at scale. Serverless pricing gets murky above ~50M vectors. You don’t own your data — migration is a real project. The managed-only posture means no on-prem story for regulated industries.

Rough cost: Serverless is $0.33/M WUs and $8.25/M RUs (write/read units, which roughly correspond to embedding bytes moved). A 10M-vector index with 1K QPS of reads is ~$500–$1,500/month. Pod-based pricing is more predictable for large steady-state workloads.

Use it when: You want to ship fast, you have budget, and you don’t want a database problem.


Qdrant

When it wins: You want fast, open-source, and self-host-friendly. Qdrant’s Rust implementation is genuinely fast — our benchmarks consistently show it matching or beating other open-source options at similar recall.

What’s good: Single binary to self-host. Docker-friendly. Rich filtering (complex boolean expressions with indexes). Good Python and JavaScript clients. Scalar quantization and binary quantization reduce memory significantly. Managed Qdrant Cloud if you don’t want to self-host.

What’s not: Multi-tenancy story is okay but not as polished as Pinecone namespaces. Cluster mode exists but is younger than Milvus’s equivalent.

Rough cost: Self-hosted, you pay for the VM. A 10M-vector workload runs comfortably on a 16GB RAM instance (~$50/mo). Managed Qdrant Cloud starts around $0.06/hr for the smallest cluster.

Use it when: You want open-source, you value performance, and you’re comfortable running infrastructure.


Weaviate

When it wins: You want hybrid search (BM25 + dense) with a lot of knobs, or you want GraphQL, or your use case leans heavily on structured objects with references.

What’s good: Built-in BM25 and hybrid search that work well. Schema-first approach helps teams building structured apps. Module system for integrating embedding generators. Good documentation. Weaviate Cloud is solid.

What’s not: Heavier than Qdrant operationally — more moving parts. GraphQL API is polarizing (love it or hate it). Memory usage is higher than lean competitors.

Rough cost: Self-hosted, plan for ~2x the RAM vs Qdrant for the same workload. Weaviate Cloud starts around $25/mo for sandbox; production clusters are $200+/mo.

Use it when: Hybrid search is a first-class need, or you like the schema/modules approach.


Milvus / Zilliz

When it wins: You have 100M+ vectors and real scale concerns. Milvus is built for big workloads — it separates compute and storage, supports distributed indexing, and handles billion-scale corpora.

What’s good: Genuinely scales. GPU indexing support. Multi-tenancy. Managed Zilliz Cloud if you don’t want to run it yourself. Active development.

What’s not: Operationally heavy. Multiple components (proxy, coord, data node, index node, query node) make self-hosting a real Kubernetes exercise. Overkill below 10M vectors. Learning curve is steep.

Rough cost: Self-hosted is complex — you’re running a K8s application. Zilliz Cloud starts around $100/mo, scales into four-figure territory fast.

Use it when: You genuinely have scale. Below 50M vectors, you’re paying for capacity you won’t use.


pgvector: The Surprise

I put pgvector last because it’s the answer for more teams than expect it.

When it wins: You already run Postgres. You have <50M vectors. You want transactional guarantees across vector and non-vector data. You don’t want to operate another database.

What’s good: CREATE EXTENSION vector; and you’re done. Works with every Postgres ecosystem tool. Transactional. Your existing backups cover vectors. HNSW and IVFFlat indexes both supported. With pg_embedding or vectorscale (Timescale’s extension), you get performance that rivals dedicated vector DBs.

What’s not: Performance ceiling is lower than Qdrant or Milvus for truly huge workloads. Hybrid search requires more manual plumbing. Less convenient for pure-vector workloads than a dedicated DB.

Rough cost: Whatever your Postgres costs. On RDS or Supabase or neon.tech, a working vector workload adds cents, not dollars.

Use it when: You already run Postgres and you’re under 50M vectors. This is our default recommendation for new teams unless they tell us otherwise.

See our full deep-dive: pgvector at Scale: When Postgres Is Enough.


The Decision Framework I Actually Use

Walk the list top-down:

  1. Do you already run Postgres and have <50M vectors? Use pgvector. Stop.
  2. Do you want zero ops and have budget? Pinecone. Stop.
  3. Do you need open-source, self-hosted, performant? Qdrant.
  4. Is hybrid search (BM25 + dense) a first-class need? Weaviate.
  5. Do you have 100M+ vectors with real scale pressure? Milvus.

That covers 90% of decisions. If you’re in the 10% (vectors + metadata graph: Vespa; serverless cheap reads: Turbopuffer; already run Elastic: use its vector support), you know who you are.


Benchmarks, With Caveats

We maintain an internal benchmark harness against SIFT1M, GloVe, and a proprietary 50M-vector corpus. A rough 2024 snapshot at P95 latency < 50ms and recall@10 ≥ 0.95:

Database10M vectors (QPS)50M vectors (QPS)Memory (50M)
Qdrant~3,800~1,40038 GB
Pinecone (pod p1.x1)~2,000~800managed
Weaviate~3,200~1,20052 GB
Milvus~3,500~1,60044 GB
pgvector (HNSW)~2,800~1,00032 GB

Treat this as directional. Your workload (dimension, distance metric, metadata filters, batch size) will shift the numbers 2x in either direction. Run your own benchmarks before committing.


The Mistakes to Avoid

1. Picking based on blog benchmarks. Every vendor publishes benchmarks that make them look good. Run your own.

2. Forgetting metadata filtering. A vector DB without fast filtered queries will force you to post-filter, which kills recall. Test this with your real filter patterns.

3. Underestimating ingestion cost. Writes are often 10–50x more expensive than reads on managed services. If you’re re-embedding corpora regularly, model the cost carefully.

4. Ignoring embedding dimension. 768-dim vectors use half the memory of 1536-dim (text-embedding-ada-002). For many use cases, a smaller embedding model (e.g., bge-small-en at 384 dim) gives 90% of the quality at 25% of the storage.

5. Treating it as write-once, read-many. Real apps update vectors constantly. Make sure your DB handles updates and deletes without rebuilding the index.


Further Reading

Evaluating vector databases for a new RAG system? We can help — we’ve migrated fleets between all of these.

← Back to Blog