Vector Databases: The Backbone of Modern AI Applications
Why Traditional Databases Can't Handle Embeddings
A 1536-dimensional embedding vector cannot be efficiently queried with a B-tree index or SQL WHERE clause. Finding the nearest neighbor requires a fundamentally different data structure. Vector databases (VectorDBs) are purpose-built for Approximate Nearest Neighbor (ANN) search on high-dimensional vectors.
┌──────────────────────────────────────────────────────────────┐
│ Vector Database Landscape 2025 │
├────────────────┬────────────┬───────────┬────────────────── ┤
│ Database │ Category │ ANN Algo │ Best For │
├────────────────┼────────────┼───────────┼────────────────── ┤
│ Pinecone │ Cloud │ Custom │ Managed, scale │
│ Weaviate │ OSS+Cloud │ HNSW │ Hybrid search │
│ Qdrant │ OSS+Cloud │ HNSW │ High performance │
│ Chroma │ OSS │ HNSW │ Local/prototype │
│ Milvus │ OSS+Cloud │ HNSW/IVF │ Enterprise scale │
│ pgvector │ PostgreSQL│ HNSW/IVF │ Existing Postgres │
│ Redis VSS │ OSS+Cloud │ HNSW/Flat│ Low-latency cache │
└────────────────┴────────────┴───────────┴────────────────── ┘
Core ANN Algorithms
HNSW (Hierarchical Navigable Small World)
The most widely used algorithm. Builds a multi-layer graph where upper layers are sparse (for fast navigation) and lower layers are dense (for precise search).
HNSW Layer Structure:
Layer 2: [A] ──────────────────────── [D]
│
Layer 1: [A] ──── [B] ──── [C] ──── [D]
│ │
Layer 0: [A]─[B]─[B2]─[C]─[C2]─[D]─[D2]─[E]
(most nodes are only in layer 0)
Search: Start at top layer, greedily navigate toward query,
descend to next layer when stuck, repeat until layer 0.
Working with Pinecone
python1from pinecone import Pinecone, ServerlessSpec 2from openai import OpenAI 3import os 4 5pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) 6openai_client = OpenAI() 7 8# Create index with metadata filtering support 9if "knowledge-base" not in pc.list_indexes().names(): 10 pc.create_index( 11 name="knowledge-base", 12 dimension=1536, 13 metric="cosine", 14 spec=ServerlessSpec(cloud="aws", region="us-east-1"), 15 ) 16 17index = pc.Index("knowledge-base") 18 19# Upsert vectors with rich metadata 20def upsert_documents(documents: list[dict]): 21 """ 22 documents: [{"id": str, "text": str, "category": str, "date": str}] 23 """ 24 texts = [d["text"] for d in documents] 25 response = openai_client.embeddings.create( 26 model="text-embedding-3-small", input=texts 27 ) 28 29 vectors = [ 30 { 31 "id": doc["id"], 32 "values": resp.embedding, 33 "metadata": { 34 "text": doc["text"], 35 "category": doc["category"], 36 "date": doc["date"] 37 } 38 } 39 for doc, resp in zip(documents, response.data) 40 ] 41 index.upsert(vectors=vectors, namespace="production") 42 43# Query with metadata filters 44def search(query: str, category_filter: str = None, top_k: int = 5): 45 query_embedding = openai_client.embeddings.create( 46 model="text-embedding-3-small", input=[query] 47 ).data[0].embedding 48 49 filter_dict = {"category": {"$eq": category_filter}} if category_filter else None 50 51 results = index.query( 52 vector=query_embedding, 53 top_k=top_k, 54 filter=filter_dict, 55 include_metadata=True, 56 namespace="production" 57 ) 58 return results.matches
Working with Qdrant (High Performance)
python1from qdrant_client import QdrantClient 2from qdrant_client.models import ( 3 VectorParams, Distance, PointStruct, 4 Filter, FieldCondition, MatchValue, SearchRequest 5) 6 7client = QdrantClient(url="http://localhost:6333") 8 9# Create collection 10client.recreate_collection( 11 collection_name="articles", 12 vectors_config=VectorParams(size=1536, distance=Distance.COSINE), 13) 14 15# Enable payload indexing for fast filtering 16client.create_payload_index( 17 collection_name="articles", 18 field_name="category", 19 field_schema="keyword" 20) 21client.create_payload_index( 22 collection_name="articles", 23 field_name="published_date", 24 field_schema="float" 25) 26 27# Batch upsert 28def batch_upsert(docs: list[dict], batch_size: int = 256): 29 for i in range(0, len(docs), batch_size): 30 batch = docs[i:i+batch_size] 31 points = [ 32 PointStruct( 33 id=doc["id"], 34 vector=doc["embedding"], 35 payload={ 36 "text": doc["text"], 37 "category": doc["category"], 38 "published_date": doc["timestamp"] 39 } 40 ) 41 for doc in batch 42 ] 43 client.upsert(collection_name="articles", points=points) 44 45# Search with filter 46results = client.search( 47 collection_name="articles", 48 query_vector=query_embedding, 49 query_filter=Filter( 50 must=[ 51 FieldCondition(key="category", match=MatchValue(value="technology")), 52 ] 53 ), 54 limit=10, 55 score_threshold=0.75 # Only return results above 75% similarity 56)
pgvector — Vector Search Inside PostgreSQL
If you already run PostgreSQL, pgvector adds native vector support with no new infrastructure:
sql1-- Install extension 2CREATE EXTENSION IF NOT EXISTS vector; 3 4-- Create table with vector column 5CREATE TABLE documents ( 6 id BIGSERIAL PRIMARY KEY, 7 content TEXT NOT NULL, 8 category VARCHAR(100), 9 embedding vector(1536), -- 1536-dimensional vector 10 created_at TIMESTAMPTZ DEFAULT NOW() 11); 12 13-- Create HNSW index (fast queries, higher build cost) 14CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) 15WITH (m = 16, ef_construction = 64); 16 17-- Or IVFFlat index (faster builds, good for 1M+ vectors) 18-- CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) 19-- WITH (lists = 100); 20 21-- Semantic search query 22SELECT 23 id, 24 content, 25 category, 26 1 - (embedding <=> $1::vector) AS similarity 27FROM documents 28WHERE category = 'AI' 29 AND 1 - (embedding <=> $1::vector) > 0.7 30ORDER BY embedding <=> $1::vector 31LIMIT 10; 32-- <=> is cosine distance operator 33-- <#> is negative inner product 34-- <-> is L2 distance
python1# Python integration with asyncpg 2import asyncpg 3import numpy as np 4 5async def search_documents(query_embedding: list[float], category: str = None): 6 conn = await asyncpg.connect(os.environ["DATABASE_URL"]) 7 8 await conn.execute("SET LOCAL hnsw.ef_search = 128;") 9 10 query = """ 11 SELECT id, content, category, 12 1 - (embedding <=> $1::vector) as similarity 13 FROM documents 14 WHERE ($2::text IS NULL OR category = $2) 15 AND 1 - (embedding <=> $1::vector) > 0.7 16 ORDER BY embedding <=> $1::vector 17 LIMIT 10 18 """ 19 20 # Convert to pgvector format 21 vector_str = "[" + ",".join(map(str, query_embedding)) + "]" 22 23 rows = await conn.fetch(query, vector_str, category) 24 return [dict(row) for row in rows]
Chroma — For Local Development and Prototyping
python1import chromadb 2from chromadb.utils import embedding_functions 3 4client = chromadb.PersistentClient(path="./chroma_db") 5 6# Use OpenAI embeddings automatically 7openai_ef = embedding_functions.OpenAIEmbeddingFunction( 8 api_key=os.environ["OPENAI_API_KEY"], 9 model_name="text-embedding-3-small" 10) 11 12collection = client.get_or_create_collection( 13 name="knowledge_base", 14 embedding_function=openai_ef, 15 metadata={"hnsw:space": "cosine"} 16) 17 18# Add documents — Chroma handles embedding automatically 19collection.add( 20 documents=["RAG combines retrieval with generation", 21 "Vector databases store embeddings"], 22 metadatas=[{"source": "blog"}, {"source": "docs"}], 23 ids=["doc1", "doc2"] 24) 25 26# Query 27results = collection.query( 28 query_texts=["What is retrieval augmented generation?"], 29 n_results=3, 30 where={"source": "blog"} # metadata filter 31)
Choosing the Right Vector Database
Decision Tree:
Already on PostgreSQL?
YES → Use pgvector (zero new infra)
NO → Continue...
Need managed cloud service?
YES → Pinecone (simplest) or Weaviate Cloud
NO → Self-hosted Qdrant or Milvus
Need hybrid search (BM25 + vector)?
YES → Weaviate (built-in BM25 module)
NO → Qdrant or Pinecone
Prototyping/local dev?
YES → Chroma (pip install chromadb, done)
NO → Qdrant (Docker: docker run -p 6333:6333 qdrant/qdrant)
Scale >100M vectors?
YES → Milvus or Pinecone (enterprise)
NO → Qdrant handles 10M+ comfortably
Performance Benchmarks
Approximate Nearest Neighbor Search (1M vectors, 1536 dims, recall@10):
HNSW (ef_search=128): ~2ms, recall=0.97
HNSW (ef_search=64): ~1ms, recall=0.94
IVFFlat (nprobe=10): ~4ms, recall=0.90
Exact search (FLAT): ~200ms, recall=1.00
Memory usage per vector:
float32 (1536 dims): 6KB per vector
1M vectors: ~6GB RAM
Quantized int8: 1.5KB per vector (~4x compression, ~1% quality loss)