Back 🧠 Types of Indexes in Vector Databases 03 Jan, 2026

📌 Detailed Explanation (RAG-focused)

🔍 What is an Index in a Vector Database?

A vector index is a data structure that enables fast similarity search over high-dimensional vectors (embeddings).

In RAG systems, indexes decide:

  • ⚡ How fast retrieval is

  • 🎯 How accurate the context is

  • 📈 How well the system scales

Embeddings give meaning — indexes give speed.


🧱 1. Flat Index (Brute-Force / Exact Search)

📌 How it works

  • Stores all vectors in raw form

  • Compares query vector with every vector

  • Uses cosine / L2 / dot product

✅ Pros

  • 100% accurate

  • No approximation

  • Simple

❌ Cons

  • Extremely slow at scale

  • O(N) complexity

🧠 Used when

  • Small datasets (<50k vectors)

  • Benchmarking

  • Validation

🛠 Used in

  • FAISS IndexFlatL2

  • Testing RAG pipelines


🧱 2. IVF (Inverted File Index)

📌 How it works

  1. Clusters vectors into centroids

  2. Each centroid → inverted list

  3. Query searches only nearest clusters

🧠 Analogy

📚 Library shelves — you search only the relevant shelf, not the entire library.

✅ Pros

  • Much faster than flat

  • Scales to millions

❌ Cons

  • Approximate

  • Needs tuning (nlist, nprobe)

🛠 Used in

  • FAISS (IVF)

  • Milvus

  • Large-scale RAG


🧱 3. HNSW (Hierarchical Navigable Small World) ⭐ GOLD STANDARD

📌 How it works

  • Builds a multi-layer graph

  • Top layers = fast jumps

  • Bottom layer = accurate neighbors

🧠 Analogy

🛣️ Google Maps: highways → city roads → local streets

✅ Pros

  • Extremely fast

  • High accuracy

  • Best for real-time RAG

❌ Cons

  • Higher memory usage

  • Slower index build

🛠 Used in

  • FAISS

  • Qdrant

  • Weaviate

  • Pinecone (conceptually)

👉 Default choice for production RAG


🧱 4. Product Quantization (PQ)

📌 How it works

  • Splits vectors into sub-vectors

  • Each sub-vector quantized

  • Stores compressed representation

🧠 Goal

Reduce memory footprint drastically

✅ Pros

  • Very memory efficient

  • Enables billion-scale search

❌ Cons

  • Loses accuracy

  • Approximate distances

🛠 Used in

  • FAISS PQ

  • Mobile / edge AI

  • Huge corpora


🧱 5. IVF + PQ (Hybrid Compression Index)

📌 How it works

  • IVF narrows search space

  • PQ compresses vectors inside clusters

✅ Pros

  • Massive scale

  • Low memory

  • Fast

❌ Cons

  • Lower accuracy than HNSW

🛠 Used in

  • FAISS (IVFPQ)

  • Very large enterprise datasets


🧱 6. Annoy (Tree-Based Index)

📌 How it works

  • Builds multiple random projection trees

  • Search across trees

✅ Pros

  • Simple

  • Disk-friendly

❌ Cons

  • Less accurate than HNSW

  • Static (no frequent updates)

🛠 Used in

  • Spotify recommendations

  • Lightweight systems


🧱 7. ScaNN (Google)

📌 How it works

  • Combines partitioning + quantization

  • Optimized for TPUs / GPUs

✅ Pros

  • Extremely fast at scale

❌ Cons

  • Complex

  • Less flexible

🛠 Used in

  • Google internal systems

  • Large ML pipelines


📊 Comparison Table (Interview-Ready)

Index TypeAccuracySpeedScaleBest Use
Flat⭐⭐⭐⭐⭐Small data
IVF⭐⭐⭐⭐⚡⚡Large corpora
HNSW⭐⭐⭐⭐⭐⚡⚡⚡✅✅RAG (Best)
PQ⭐⭐⚡⚡✅✅Memory-limited
IVFPQ⭐⭐⭐⚡⚡⚡✅✅✅Massive scale
Annoy⭐⭐⭐⚠️Lightweight
ScaNN⭐⭐⭐⭐⚡⚡⚡✅✅Google-scale

🧠 Which Index Should You Use in RAG?

✅ Recommended Stack

Embeddings → HNSW → Hybrid Search → MMR → LLM

Simple Rule:

  • 🔹 <100k vectors → Flat / HNSW

  • 🔹 100k–10M → HNSW

  • 🔹 10M+ → IVF + PQ


🎯 One-Line Summary

Vector indexes are the performance engine of RAG — and HNSW is the industry’s most trusted index for fast, accurate semantic retrieval.


I