MMR = Maximal Marginal Relevance

It’s a document re-ranking technique used after retrieval to ensure that the final set of documents given to the LLM are:

Relevant to the query, and
Diverse, avoiding redundancy.

⚡ Problem Without MMR

If you just take the top-k results from a retriever (like FAISS, Pinecone, Chroma), you might get:

Many very similar documents (same sentences or near-duplicates).
Less coverage of the full context.

This wastes context window space and reduces answer quality.

⚡ How MMR Works

MMR tries to balance Relevance vs Diversity.

Formula (simplified):

$MMR = \arg\max_{D_i \in C \setminus S} \Big[ \lambda \cdot \text{Sim}(D_i, Q) - (1 - \lambda) \cdot \max_{D_j \in S} \text{Sim}(D_i, D_j) \Big]$

Q = Query
C = Candidate documents
S = Already selected docs
Sim(·,·) = Similarity score (cosine similarity, embeddings)
λ (lambda) = Trade-off factor (0 → diverse focus, 1 → relevance focus)

👉 Intuition: Select documents that are relevant to the query but not too similar to each other.

🔹 Example

Query: “What are the side effects of Rosuvastatin?”

Without MMR:

Doc1: Side effects list (headache, fatigue, nausea)
Doc2: Same list again
Doc3: Same list repeated from another source

With MMR:

Doc1: Side effects list (headache, fatigue, nausea)
Doc2: Rare side effects (liver issues, muscle pain)
Doc3: Drug interaction warnings

➡️ The LLM now has broader, richer context.

🔹 In LangChain

You can use MMR by setting the search_type when calling a retriever:

retriever = vectorstore.as_retriever(
    search_type="mmr",   # Use Maximal Marginal Relevance
    search_kwargs={"k": 5, "lambda_mult": 0.7}  
)

docs = retriever.get_relevant_documents("What are the side effects of Rosuvastatin?")

k → number of documents to fetch
lambda_mult → balance between relevance (close to 1) vs diversity (close to 0)

🔹 Benefits of MMR in RAG

✅ Reduces redundancy (no repeated docs)
✅ Ensures coverage of different perspectives
✅ Better use of limited context window
✅ Improves factual grounding of answers