Vector Databases: The Backbone of Modern AI Applications

Why Traditional Databases Can't Handle Embeddings

A 1536-dimensional embedding vector cannot be efficiently queried with a B-tree index or SQL WHERE clause. Finding the nearest neighbor requires a fundamentally different data structure. Vector databases (VectorDBs) are purpose-built for Approximate Nearest Neighbor (ANN) search on high-dimensional vectors.

┌──────────────────────────────────────────────────────────────┐
│               Vector Database Landscape 2025                 │
├────────────────┬────────────┬───────────┬──────────────────  ┤
│  Database      │  Category  │  ANN Algo │  Best For          │
├────────────────┼────────────┼───────────┼──────────────────  ┤
│  Pinecone      │  Cloud     │  Custom   │  Managed, scale    │
│  Weaviate      │  OSS+Cloud │  HNSW     │  Hybrid search     │
│  Qdrant        │  OSS+Cloud │  HNSW     │  High performance  │
│  Chroma        │  OSS       │  HNSW     │  Local/prototype   │
│  Milvus        │  OSS+Cloud │  HNSW/IVF │  Enterprise scale  │
│  pgvector      │  PostgreSQL│  HNSW/IVF │  Existing Postgres │
│  Redis VSS     │  OSS+Cloud │  HNSW/Flat│  Low-latency cache │
└────────────────┴────────────┴───────────┴──────────────────  ┘

Core ANN Algorithms

HNSW (Hierarchical Navigable Small World)

The most widely used algorithm. Builds a multi-layer graph where upper layers are sparse (for fast navigation) and lower layers are dense (for precise search).

HNSW Layer Structure:

Layer 2:  [A] ──────────────────────── [D]
                                         │
Layer 1:  [A] ──── [B] ──── [C] ──── [D]
                    │         │
Layer 0:  [A]─[B]─[B2]─[C]─[C2]─[D]─[D2]─[E]
          (most nodes are only in layer 0)

Search: Start at top layer, greedily navigate toward query,
        descend to next layer when stuck, repeat until layer 0.

Working with Pinecone

python
1from pinecone import Pinecone, ServerlessSpec
2from openai import OpenAI
3import os
4
5pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
6openai_client = OpenAI()
7
8# Create index with metadata filtering support
9if "knowledge-base" not in pc.list_indexes().names():
10    pc.create_index(
11        name="knowledge-base",
12        dimension=1536,
13        metric="cosine",
14        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
15    )
16
17index = pc.Index("knowledge-base")
18
19# Upsert vectors with rich metadata
20def upsert_documents(documents: list[dict]):
21    """
22    documents: [{"id": str, "text": str, "category": str, "date": str}]
23    """
24    texts = [d["text"] for d in documents]
25    response = openai_client.embeddings.create(
26        model="text-embedding-3-small", input=texts
27    )
28    
29    vectors = [
30        {
31            "id": doc["id"],
32            "values": resp.embedding,
33            "metadata": {
34                "text": doc["text"],
35                "category": doc["category"],
36                "date": doc["date"]
37            }
38        }
39        for doc, resp in zip(documents, response.data)
40    ]
41    index.upsert(vectors=vectors, namespace="production")
42
43# Query with metadata filters
44def search(query: str, category_filter: str = None, top_k: int = 5):
45    query_embedding = openai_client.embeddings.create(
46        model="text-embedding-3-small", input=[query]
47    ).data[0].embedding
48    
49    filter_dict = {"category": {"$eq": category_filter}} if category_filter else None
50    
51    results = index.query(
52        vector=query_embedding,
53        top_k=top_k,
54        filter=filter_dict,
55        include_metadata=True,
56        namespace="production"
57    )
58    return results.matches

Working with Qdrant (High Performance)

python
1from qdrant_client import QdrantClient
2from qdrant_client.models import (
3    VectorParams, Distance, PointStruct,
4    Filter, FieldCondition, MatchValue, SearchRequest
5)
6
7client = QdrantClient(url="http://localhost:6333")
8
9# Create collection
10client.recreate_collection(
11    collection_name="articles",
12    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
13)
14
15# Enable payload indexing for fast filtering
16client.create_payload_index(
17    collection_name="articles",
18    field_name="category",
19    field_schema="keyword"
20)
21client.create_payload_index(
22    collection_name="articles",
23    field_name="published_date",
24    field_schema="float"
25)
26
27# Batch upsert
28def batch_upsert(docs: list[dict], batch_size: int = 256):
29    for i in range(0, len(docs), batch_size):
30        batch = docs[i:i+batch_size]
31        points = [
32            PointStruct(
33                id=doc["id"],
34                vector=doc["embedding"],
35                payload={
36                    "text": doc["text"],
37                    "category": doc["category"],
38                    "published_date": doc["timestamp"]
39                }
40            )
41            for doc in batch
42        ]
43        client.upsert(collection_name="articles", points=points)
44
45# Search with filter
46results = client.search(
47    collection_name="articles",
48    query_vector=query_embedding,
49    query_filter=Filter(
50        must=[
51            FieldCondition(key="category", match=MatchValue(value="technology")),
52        ]
53    ),
54    limit=10,
55    score_threshold=0.75  # Only return results above 75% similarity
56)

pgvector — Vector Search Inside PostgreSQL

If you already run PostgreSQL, pgvector adds native vector support with no new infrastructure:

sql
1-- Install extension
2CREATE EXTENSION IF NOT EXISTS vector;
3
4-- Create table with vector column
5CREATE TABLE documents (
6    id BIGSERIAL PRIMARY KEY,
7    content TEXT NOT NULL,
8    category VARCHAR(100),
9    embedding vector(1536),  -- 1536-dimensional vector
10    created_at TIMESTAMPTZ DEFAULT NOW()
11);
12
13-- Create HNSW index (fast queries, higher build cost)
14CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
15WITH (m = 16, ef_construction = 64);
16
17-- Or IVFFlat index (faster builds, good for 1M+ vectors)
18-- CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
19-- WITH (lists = 100);
20
21-- Semantic search query
22SELECT 
23    id,
24    content,
25    category,
26    1 - (embedding <=> $1::vector) AS similarity
27FROM documents
28WHERE category = 'AI'
29  AND 1 - (embedding <=> $1::vector) > 0.7
30ORDER BY embedding <=> $1::vector
31LIMIT 10;
32-- <=> is cosine distance operator
33-- <#> is negative inner product
34-- <-> is L2 distance

python
1# Python integration with asyncpg
2import asyncpg
3import numpy as np
4
5async def search_documents(query_embedding: list[float], category: str = None):
6    conn = await asyncpg.connect(os.environ["DATABASE_URL"])
7    
8    await conn.execute("SET LOCAL hnsw.ef_search = 128;")
9    
10    query = """
11        SELECT id, content, category,
12               1 - (embedding <=> $1::vector) as similarity
13        FROM documents
14        WHERE ($2::text IS NULL OR category = $2)
15          AND 1 - (embedding <=> $1::vector) > 0.7
16        ORDER BY embedding <=> $1::vector
17        LIMIT 10
18    """
19    
20    # Convert to pgvector format
21    vector_str = "[" + ",".join(map(str, query_embedding)) + "]"
22    
23    rows = await conn.fetch(query, vector_str, category)
24    return [dict(row) for row in rows]

Chroma — For Local Development and Prototyping

python
1import chromadb
2from chromadb.utils import embedding_functions
3
4client = chromadb.PersistentClient(path="./chroma_db")
5
6# Use OpenAI embeddings automatically
7openai_ef = embedding_functions.OpenAIEmbeddingFunction(
8    api_key=os.environ["OPENAI_API_KEY"],
9    model_name="text-embedding-3-small"
10)
11
12collection = client.get_or_create_collection(
13    name="knowledge_base",
14    embedding_function=openai_ef,
15    metadata={"hnsw:space": "cosine"}
16)
17
18# Add documents — Chroma handles embedding automatically
19collection.add(
20    documents=["RAG combines retrieval with generation",
21               "Vector databases store embeddings"],
22    metadatas=[{"source": "blog"}, {"source": "docs"}],
23    ids=["doc1", "doc2"]
24)
25
26# Query
27results = collection.query(
28    query_texts=["What is retrieval augmented generation?"],
29    n_results=3,
30    where={"source": "blog"}  # metadata filter
31)

Choosing the Right Vector Database

Decision Tree:

 Already on PostgreSQL?
    YES → Use pgvector (zero new infra)
    NO → Continue...

 Need managed cloud service?
    YES → Pinecone (simplest) or Weaviate Cloud
    NO → Self-hosted Qdrant or Milvus

 Need hybrid search (BM25 + vector)?
    YES → Weaviate (built-in BM25 module)
    NO → Qdrant or Pinecone

 Prototyping/local dev?
    YES → Chroma (pip install chromadb, done)
    NO → Qdrant (Docker: docker run -p 6333:6333 qdrant/qdrant)

 Scale >100M vectors?
    YES → Milvus or Pinecone (enterprise)
    NO → Qdrant handles 10M+ comfortably

Performance Benchmarks

Approximate Nearest Neighbor Search (1M vectors, 1536 dims, recall@10):

  HNSW (ef_search=128): ~2ms, recall=0.97
  HNSW (ef_search=64):  ~1ms, recall=0.94  
  IVFFlat (nprobe=10):  ~4ms, recall=0.90
  Exact search (FLAT):  ~200ms, recall=1.00

Memory usage per vector:
  float32 (1536 dims): 6KB per vector
  1M vectors: ~6GB RAM
  Quantized int8:      1.5KB per vector (~4x compression, ~1% quality loss)

Vector Databases: The Backbone of Modern AI Applications

Vector Databases: The Backbone of Modern AI Applications

Why Traditional Databases Can't Handle Embeddings

Core ANN Algorithms

HNSW (Hierarchical Navigable Small World)

Working with Pinecone

Working with Qdrant (High Performance)

pgvector — Vector Search Inside PostgreSQL

Chroma — For Local Development and Prototyping

Choosing the Right Vector Database

Performance Benchmarks

Sumit Kumar Pandey

Share this article

Discussion (0)