Back 🧠 Types of Indexes in Vector Databases 03 Jan, 2026

📌 Detailed Explanation (RAG-focused)

🔍 What is an Index in a Vector Database?

A vector index is a data structure that enables fast similarity search over high-dimensional vectors (embeddings).

In RAG systems, indexes decide:

⚡ How fast retrieval is
🎯 How accurate the context is
📈 How well the system scales

Embeddings give meaning — indexes give speed.

🧱 1. Flat Index (Brute-Force / Exact Search)

📌 How it works

Stores all vectors in raw form
Compares query vector with every vector
Uses cosine / L2 / dot product

✅ Pros

100% accurate
No approximation
Simple

❌ Cons

Extremely slow at scale
O(N) complexity

🧠 Used when

Small datasets (<50k vectors)
Benchmarking
Validation

🛠 Used in

FAISS IndexFlatL2
Testing RAG pipelines

🧱 2. IVF (Inverted File Index)

📌 How it works

Clusters vectors into centroids
Each centroid → inverted list
Query searches only nearest clusters

🧠 Analogy

📚 Library shelves — you search only the relevant shelf, not the entire library.

✅ Pros

Much faster than flat
Scales to millions

❌ Cons

Approximate
Needs tuning (nlist, nprobe)

🛠 Used in

FAISS (IVF)
Milvus
Large-scale RAG

🧱 3. HNSW (Hierarchical Navigable Small World) ⭐ GOLD STANDARD

📌 How it works

Builds a multi-layer graph
Top layers = fast jumps
Bottom layer = accurate neighbors

🧠 Analogy

🛣️ Google Maps: highways → city roads → local streets

✅ Pros

Extremely fast
High accuracy
Best for real-time RAG

❌ Cons

Higher memory usage
Slower index build

🛠 Used in

FAISS
Qdrant
Weaviate
Pinecone (conceptually)

👉 Default choice for production RAG

🧱 4. Product Quantization (PQ)

📌 How it works

Splits vectors into sub-vectors
Each sub-vector quantized
Stores compressed representation

🧠 Goal

Reduce memory footprint drastically

✅ Pros

Very memory efficient
Enables billion-scale search

❌ Cons

Loses accuracy
Approximate distances

🛠 Used in

FAISS PQ
Mobile / edge AI
Huge corpora

🧱 5. IVF + PQ (Hybrid Compression Index)

📌 How it works

IVF narrows search space
PQ compresses vectors inside clusters

✅ Pros

Massive scale
Low memory
Fast

❌ Cons

Lower accuracy than HNSW

🛠 Used in

FAISS (IVFPQ)
Very large enterprise datasets

🧱 6. Annoy (Tree-Based Index)

📌 How it works

Builds multiple random projection trees
Search across trees

✅ Pros

Simple
Disk-friendly

❌ Cons

Less accurate than HNSW
Static (no frequent updates)

🛠 Used in

Spotify recommendations
Lightweight systems

🧱 7. ScaNN (Google)

📌 How it works

Combines partitioning + quantization
Optimized for TPUs / GPUs

✅ Pros

Extremely fast at scale

❌ Cons

Complex
Less flexible

🛠 Used in

Google internal systems
Large ML pipelines

📊 Comparison Table (Interview-Ready)

Index Type	Accuracy	Speed	Scale	Best Use
Flat	⭐⭐⭐⭐⭐	❌	❌	Small data
IVF	⭐⭐⭐⭐	⚡⚡	✅	Large corpora
HNSW	⭐⭐⭐⭐⭐	⚡⚡⚡	✅✅	RAG (Best)
PQ	⭐⭐	⚡⚡	✅✅	Memory-limited
IVFPQ	⭐⭐⭐	⚡⚡⚡	✅✅✅	Massive scale
Annoy	⭐⭐⭐	⚡	⚠️	Lightweight
ScaNN	⭐⭐⭐⭐	⚡⚡⚡	✅✅	Google-scale

🧠 Which Index Should You Use in RAG?

✅ Recommended Stack

Embeddings → HNSW → Hybrid Search → MMR → LLM

Simple Rule:

🔹 <100k vectors → Flat / HNSW
🔹 100k–10M → HNSW
🔹 10M+ → IVF + PQ

🎯 One-Line Summary

Vector indexes are the performance engine of RAG — and HNSW is the industry’s most trusted index for fast, accurate semantic retrieval.