Some text some message..
Back HyDE | Window Search | Self-Query Retriever | RAG Fusion | Contextual Compression Retrieval 03 Jan, 2026

1️⃣ HyDE (Hypothetical Document Embeddings)

🔹 What is HyDE?

HyDE is a retrieval technique where the system first imagines a perfect answer to the query and then uses that imagined answer to retrieve real documents.

Instead of embedding the question, you embed a hypothetical answer.


🔹 Why HyDE Exists

User questions are often:

  • Short

  • Vague

  • Poorly aligned with document language

Documents, however, are:

  • Long

  • Structured

  • Answer-oriented

HyDE bridges this mismatch.


🔹 How HyDE Works (Intuitively)

  1. User asks a question

  2. LLM generates a hypothetical ideal answer

  3. That answer is embedded

  4. Vector search is performed using this embedding

  5. Real documents matching that “ideal answer” are retrieved


🔹 Real-World Analogy

You don’t search a library using a question like:

“How does kidney failure progress?”

You search using a paragraph describing kidney failure.

HyDE auto-writes that paragraph for you.


🔹 Strengths

  • Works extremely well for vague queries

  • Improves recall dramatically

  • Especially useful in scientific / medical / technical RAG


🔹 Weaknesses

  • Depends on LLM quality

  • Can introduce bias

  • Extra generation cost


🔹 Best Use Case

  • Research RAG

  • Medical / legal QA

  • Exploratory queries


2️⃣ Window Search (Sliding Window Retrieval)

🔹 What is Window Search?

Window search retrieves neighboring chunks around a matched chunk to preserve context.

Instead of retrieving isolated chunks, you retrieve a window of context.


🔹 Why Window Search Exists

Chunking breaks:

  • Narrative flow

  • Logical continuity

  • Cause-effect relationships

Window search fixes this.


🔹 How It Works

  1. Retrieve a relevant chunk

  2. Also fetch:

    • Previous chunk

    • Next chunk

  3. Combine them into one context window


🔹 Intuition

Reading only one paragraph from a book rarely gives full meaning.
Window search gives you the paragraph before and after.


🔹 Strengths

  • Preserves context

  • Improves answer coherence

  • Simple and effective


🔹 Weaknesses

  • Adds tokens

  • May include irrelevant text


🔹 Best Use Case

  • PDFs

  • Policies

  • Books

  • Manuals


3️⃣ Self-Query Retriever

🔹 What is Self-Query Retrieval?

A self-query retriever allows the LLM to:

  • Understand the user’s intent

  • Extract structured filters

  • Generate the retrieval query itself

The LLM becomes the query planner, not just the answer generator.


🔹 Why It Exists

Users mix:

  • Natural language

  • Implicit constraints

  • Metadata requirements

Example:

“Show me beginner-level Python tutorials after 2022”

This includes:

  • Topic

  • Difficulty

  • Time filter


🔹 How It Works

  1. LLM parses user query

  2. Extracts:

    • Semantic query

    • Metadata filters

  3. Executes filtered retrieval


🔹 Intuition

It’s like an intelligent librarian who understands what you really want and applies filters automatically.


🔹 Strengths

  • Very powerful for enterprise RAG

  • Handles structured + unstructured data

  • Reduces irrelevant results


🔹 Weaknesses

  • Depends on clean metadata

  • Needs schema alignment


🔹 Best Use Case

  • Product catalogs

  • Course platforms

  • Enterprise document search


4️⃣ Contextual Compression Retrieval

(Often confused as “Contractual”; correct term is Contextual)

🔹 What is Contextual Compression?

Contextual compression shrinks retrieved documents to keep only the parts relevant to the query.

Retrieval stays the same — context gets compressed.


🔹 Why It Exists

Even good retrieval often returns:

  • Long documents

  • Partially relevant sections

LLMs have:

  • Token limits

  • Cost constraints


🔹 How It Works

  1. Retrieve documents

  2. Use:

    • LLM

    • Extractor

    • Reranker

  3. Remove irrelevant sections

  4. Pass only high-signal text to LLM


🔹 Intuition

Instead of handing someone a full book, you highlight only the important sentences.


🔹 Strengths

  • Saves tokens

  • Reduces hallucination

  • Improves precision


🔹 Weaknesses

  • Additional compute

  • Over-compression risk


🔹 Best Use Case

  • Long documents

  • Token-sensitive RAG

  • High-cost LLMs


5️⃣ RAG Fusion (Multi-Query Fusion)

🔹 What is RAG Fusion?

RAG Fusion improves retrieval by:

  • Generating multiple query variants

  • Retrieving for each

  • Merging and deduplicating results

One question → many perspectives → better recall


🔹 Why RAG Fusion Exists

Single queries suffer from:

  • Wording bias

  • Vocabulary mismatch

  • Narrow perspective


🔹 How It Works

  1. LLM rewrites query into multiple forms

  2. Each query retrieves documents

  3. Results are merged

  4. Reranked or deduplicated


🔹 Intuition

You ask:

“How to manage diabetes?”

But also search:

  • “Diabetes treatment guidelines”

  • “Blood sugar control methods”

  • “Lifestyle changes for diabetes”

More doors → more relevant knowledge.


🔹 Strengths

  • Massive recall improvement

  • Reduces retrieval blind spots

  • Excellent for research RAG


🔹 Weaknesses

  • Higher cost

  • More latency

  • Needs reranking


🔹 Best Use Case

  • Knowledge-heavy systems

  • Scientific QA

  • Open-domain RAG


🧠 How These Fit Together (Big Picture)

TechniqueSolves
HyDEPoor query embeddings
Window SearchLost context
Self-QueryHidden constraints
Contextual CompressionToken overload
RAG FusionNarrow recall

They are not competitors — they are complementary tools.


🎯 Final Intuition Summary

  • HyDE → “Imagine the answer first”

  • Window Search → “Read around the paragraph”

  • Self-Query → “Let the system understand filters”

  • Contextual Compression → “Keep only what matters”

  • RAG Fusion → “Search from multiple angles”


🏁 One-Line Takeaway

Advanced RAG retrieval is not about finding more data — it’s about finding the right data, in the right shape, with the right context.