Maximal Marginal Relevance (MMR) is a retrieval strategy used to select results that are:
✅ Highly relevant to the query
✅ Minimally redundant with each other
In short:
MMR = Relevance + Diversity (controlled balance)
It avoids the problem where all retrieved results say the same thing in different words.
Traditional similarity search retrieves the top-k most similar documents.
❌ Problem:
They often overlap heavily
Context becomes repetitive
LLM answers become narrow or biased
Query: “Benefits of exercise”
Without MMR:
Doc 1: Cardio improves heart
Doc 2: Cardio improves heart (rephrased)
Doc 3: Cardio improves heart (again)
With MMR:
Cardio health
Mental health benefits
Weight management
Bone strength
👉 Same relevance, more coverage
MMR selects documents one by one using this logic:
MMR(d) = λ × Relevance(d, Query)
− (1 − λ) × MaxSimilarity(d, SelectedDocs)
| Component | Meaning |
|---|---|
| Relevance | How close the doc is to the query |
| Redundancy penalty | How similar it is to already selected docs |
| λ (lambda) | Controls balance |
| λ Value | Behavior |
|---|---|
| 0.9 | Mostly relevance (less diversity) |
| 0.5 | Balanced relevance + diversity ⭐ |
| 0.1 | Mostly diversity |
👉 In RAG systems, λ = 0.3–0.7 is commonly used.
Retrieve top N candidates using embeddings
Select most relevant document first
For each next selection:
Score relevance
Penalize similarity with already chosen docs
Repeat until k documents selected
In Retrieval-Augmented Generation:
Context is repetitive
LLM hallucination risk increases
Narrow perspective
Diverse evidence
Broader context
Better grounded answers
📌 That’s why LangChain, LlamaIndex, FAISS all support MMR.
Prevent repeated answers
Multi-angle responses
Diverse result pages
Reduced echo effect
Multiple viewpoints
Safer reasoning
Variety without losing relevance
Imagine asking 5 doctors about a disease:
❌ All from same hospital, same opinion
✅ Doctors from different specializations
MMR ensures you don’t hear the same voice 5 times.
Slightly slower than pure similarity search
Needs tuning of λ
Over-diversity may dilute focus if λ too low
MMR smartly balances relevance and diversity to give richer, non-repetitive retrieval results — essential for high-quality RAG systems.