🧠 Big Picture First (Mental Model)
Imagine you are talking to an AI assistant 📚
You have 100 documents, but the AI can only read 10 properly.
So you need a filter before sending documents to the LLM.
There are two popular filters:
🗣️ LLM Chain Filter → “Ask an LLM to decide relevance”
📐 Embedding Filter → “Use math (vectors) to measure similarity”
LLM Chain Filter uses an LLM itself to decide whether a document is relevant or not.
Think of it as:
🧑🏫 “Hey GPT, read this document and tell me:
Is it useful for answering the user’s question?”
📌 It reads text, understands meaning, context, and intent.
User asks a question
👉 “What are side effects of Atorvastatin?”
Retriever fetches many documents
Blogs
Medical PDFs
Reviews
LLM Chain Filter:
Sends each document + question to the LLM
LLM answers:
✔ Relevant / ❌ Not relevant
Only approved documents move forward 🚀
📄 Doc 1: “Atorvastatin side effects include muscle pain…”
📄 Doc 2: “History of Roman Empire”
📄 Doc 3: “Fenofibrate vs Statin comparison”
✅ Doc 1 → YES (directly useful)
❌ Doc 2 → NO (irrelevant)
⚠️ Doc 3 → YES (contextually helpful)
➡️ LLM understands nuance, not just keywords.
from langchain_core.document_compressors import LLMChainFilter
filter = LLMChainFilter.from_llm(llm)
compressed_docs = filter.compress_documents(
documents=docs,
query="Side effects of statins"
)
🌟 Very intelligent
Understands context
Handles synonyms
Works well for complex medical, legal, financial text
🧠 Human-like judgment
Can reason
Can ignore misleading keywords
💰 Expensive
Every document = LLM call
🐢 Slow
Not ideal for 1,000+ docs
🔁 Non-deterministic
Slight variation in answers
📛 Overkill
Simple keyword queries don’t need it
✔ Medical Q&A
✔ Legal documents
✔ Policy interpretation
✔ Complex RAG pipelines
✔ Small–medium datasets
Embedding Filter uses vector similarity (math) to decide relevance.
Think of it as:
📏 “How close is this document to the user query in meaning?”
No LLM reasoning — only numerical similarity.
Convert query → vector
Convert documents → vectors
Measure cosine similarity
Keep documents above a threshold
📐 Higher similarity = more relevant
“Statin muscle pain”
| Document | Similarity |
|---|---|
| Atorvastatin side effects | 0.91 ✅ |
| Fenofibrate comparison | 0.78 ✅ |
| Heart anatomy basics | 0.45 ❌ |
| Roman history | 0.02 ❌ |
➡️ Only top matches survive 🔥
from langchain_core.document_compressors import EmbeddingsFilter
filter = EmbeddingsFilter(
embeddings=embedding_model,
similarity_threshold=0.75
)
compressed_docs = filter.compress_documents(
documents=docs,
query="Statin side effects"
)
⚡ Very fast
💸 Cheap
📈 Scales well
🔁 Deterministic results
🧮 Perfect for large datasets
🤖 No reasoning
Can’t understand intent deeply
🎯 Threshold sensitive
Wrong threshold = missed info
🧩 Semantic confusion
Similar words ≠ useful content
✔ Large document collections
✔ E-commerce search
✔ Product reviews
✔ FAQ systems
✔ Fast RAG pipelines
| Feature | LLM Chain Filter 🗣️ | Embedding Filter 📐 |
|---|---|---|
| Intelligence | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Speed | 🐢 Slow | ⚡ Fast |
| Cost | 💰 High | 💸 Low |
| Reasoning | Yes | No |
| Scalability | Medium | High |
| Best for | Complex meaning | Large datasets |
Retriever
↓
Embedding Filter (fast pruning)
↓
LLM Chain Filter (deep reasoning)
↓
Final Answer
📌 This is called Contextual Compression Retriever
— exactly what you are implementing in your project 👏
Since you are working on:
🛒 Product reviews
🧠 LLM-based assistants
📊 Large scraped datasets
👉 Best choice for you:
Embedding Filter first
LLM Chain Filter optionally