It’s a document re-ranking technique used after retrieval to ensure that the final set of documents given to the LLM are:
Relevant to the query, and
Diverse, avoiding redundancy.
If you just take the top-k results from a retriever (like FAISS, Pinecone, Chroma), you might get:
Many very similar documents (same sentences or near-duplicates).
Less coverage of the full context.
This wastes context window space and reduces answer quality.
MMR tries to balance Relevance vs Diversity.
Formula (simplified):
Q = Query
C = Candidate documents
S = Already selected docs
Sim(·,·) = Similarity score (cosine similarity, embeddings)
λ (lambda) = Trade-off factor (0 → diverse focus, 1 → relevance focus)
👉 Intuition: Select documents that are relevant to the query but not too similar to each other.
Query: “What are the side effects of Rosuvastatin?”
Without MMR:
Doc1: Side effects list (headache, fatigue, nausea)
Doc2: Same list again
Doc3: Same list repeated from another source
With MMR:
Doc1: Side effects list (headache, fatigue, nausea)
Doc2: Rare side effects (liver issues, muscle pain)
Doc3: Drug interaction warnings
➡️ The LLM now has broader, richer context.
You can use MMR by setting the search_type when calling a retriever:
retriever = vectorstore.as_retriever(
search_type="mmr", # Use Maximal Marginal Relevance
search_kwargs={"k": 5, "lambda_mult": 0.7}
)
docs = retriever.get_relevant_documents("What are the side effects of Rosuvastatin?")
k → number of documents to fetch
lambda_mult → balance between relevance (close to 1) vs diversity (close to 0)
✅ Reduces redundancy (no repeated docs)
✅ Ensures coverage of different perspectives
✅ Better use of limited context window
✅ Improves factual grounding of answers