Some text some message..
Back MMR in RAG 22 Aug, 2025

MMR = Maximal Marginal Relevance

It’s a document re-ranking technique used after retrieval to ensure that the final set of documents given to the LLM are:

  1. Relevant to the query, and

  2. Diverse, avoiding redundancy.


⚡ Problem Without MMR

If you just take the top-k results from a retriever (like FAISS, Pinecone, Chroma), you might get:

  • Many very similar documents (same sentences or near-duplicates).

  • Less coverage of the full context.

This wastes context window space and reduces answer quality.


⚡ How MMR Works

MMR tries to balance Relevance vs Diversity.

Formula (simplified):

MMR=argmaxDiCS[λSim(Di,Q)(1λ)maxDjSSim(Di,Dj)]MMR = \arg\max_{D_i \in C \setminus S} \Big[ \lambda \cdot \text{Sim}(D_i, Q) - (1 - \lambda) \cdot \max_{D_j \in S} \text{Sim}(D_i, D_j) \Big]

  • Q = Query

  • C = Candidate documents

  • S = Already selected docs

  • Sim(·,·) = Similarity score (cosine similarity, embeddings)

  • λ (lambda) = Trade-off factor (0 → diverse focus, 1 → relevance focus)

👉 Intuition: Select documents that are relevant to the query but not too similar to each other.


🔹 Example

Query: “What are the side effects of Rosuvastatin?”

Without MMR:

  • Doc1: Side effects list (headache, fatigue, nausea)

  • Doc2: Same list again

  • Doc3: Same list repeated from another source

With MMR:

  • Doc1: Side effects list (headache, fatigue, nausea)

  • Doc2: Rare side effects (liver issues, muscle pain)

  • Doc3: Drug interaction warnings

➡️ The LLM now has broader, richer context.


🔹 In LangChain

You can use MMR by setting the search_type when calling a retriever:

retriever = vectorstore.as_retriever(
    search_type="mmr",   # Use Maximal Marginal Relevance
    search_kwargs={"k": 5, "lambda_mult": 0.7}  
)

docs = retriever.get_relevant_documents("What are the side effects of Rosuvastatin?")
  • k → number of documents to fetch

  • lambda_mult → balance between relevance (close to 1) vs diversity (close to 0)


🔹 Benefits of MMR in RAG

  • ✅ Reduces redundancy (no repeated docs)

  • ✅ Ensures coverage of different perspectives

  • ✅ Better use of limited context window

  • ✅ Improves factual grounding of answers