Hybrid search combines lexical (keyword-based) search and semantic (vector-based) search to retrieve documents.
It answers:
“What documents match the words and the meaning of the query?”
Keyword search (BM25 / TF-IDF) retrieves exact matches
Vector search retrieves semantically similar content
Scores are combined or merged
Top-N documents are returned
Keyword search → precise but brittle
Vector search → semantic but approximate
Hybrid search fixes:
Synonym mismatch
Domain-specific terminology
Rare or technical keywords
Best recall across diverse corpora
Handles exact terms + meaning
Industry standard for enterprise RAG
Does not handle redundancy
Ranking quality still approximate
Top results may overlap heavily
Hybrid search is primarily a candidate generator.
Its job is recall, not precision.
MMR (Maximal Marginal Relevance) is a diversity-aware selection algorithm applied after retrieval.
It answers:
“Which documents are relevant without repeating the same idea?”
MMR selects documents iteratively by:
Maximizing relevance to the query
Minimizing similarity to already selected documents
It balances:
Relevance
Diversity
Using a tunable parameter (λ).
Pure similarity retrieval often returns:
Near-duplicate chunks
Rephrased versions of the same paragraph
MMR fixes:
Redundant context
Narrow perspective
Reduces repetition
Improves coverage
Very effective for RAG context building
Slight computational overhead
Does not deeply understand semantics
Diversity may reduce focus if overused
MMR is a context diversification step.
Its job is breadth, not precision ranking.
Reranking is a precision optimization step that reorders retrieved documents using a stronger relevance model.
It answers:
“Which documents are actually the best answers to this query?”
Take top-N retrieved documents
Score each document against the query using:
Cross-encoder
Transformer
LLM
Reorder and select top-K
Embedding similarity:
Is geometric
Misses nuance
Cannot model deep intent
Reranking fixes:
Subtle intent mismatch
Contextual errors
Weak top-K quality
Highest precision
Strong semantic understanding
Major reduction in hallucination
Slower
Costly
Applied only to small candidate sets
Reranking is a quality gate before the LLM.
Its job is precision, not recall.
| Aspect | Hybrid Search | MMR | Reranking |
|---|---|---|---|
| Primary goal | Recall | Diversity | Precision |
| Applied when | During retrieval | After retrieval | After retrieval |
| Uses | Keywords + vectors | Similarity + diversity | Deep models |
| Handles redundancy | ❌ | ✅ | ❌ |
| Handles intent | ⚠️ Partial | ❌ | ✅ |
| Computational cost | Low | Medium | High |
In production-grade RAG, these are not alternatives — they are layers.
Hybrid search → broad candidate recall
MMR → remove redundancy & expand coverage
Reranking → select best final context
LLM generation
Corpus is large and diverse
Exact terms matter
You need strong recall
Retrieved chunks are repetitive
You want multi-angle answers
Context window is limited
Answer quality matters more than latency
Queries are complex
Hallucination risk must be minimized
Hybrid search finds candidates
MMR removes repetition
Reranking chooses the best answers
Hybrid search maximizes recall, MMR maximizes diversity, and reranking maximizes precision.
Together, they form the retrieval intelligence layer of a high-quality RAG system.