Lazy retrieval = retrieval happens only when it is actually needed (at execution time), not before.
In simple words:
The retriever does NOT run when you build the chain.
It runs only when you callchain.invoke().
chain = (
{"context": retriever | format_docs,
"question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
At this point:
❌ Retriever is NOT executed
❌ No documents are fetched
❌ No vector search happens
You're just defining a pipeline blueprint.
response = chain.invoke(query)
Now LangChain:
Takes query
Sends it to retriever
Retrieves documents
Formats them
Passes to prompt
Calls LLM
That is lazy execution.
If you do this:
retrieved_docs = retriever.invoke(query)
Now retrieval runs immediately.
Even if:
You never call LLM
You never use the chain
You discard the result
That is called eager retrieval.
You open app → nothing cooks yet
You place order → food starts cooking
Execution happens only when required.
You cook everything in advance
Even if nobody orders
Wasteful if unused.
Lazy execution means:
Functions are deferred
Computation happens on demand
Improves efficiency
Enables composability
LangChain LCEL works lazily by default.
Because it allows:
Dynamic queries
Streaming responses
Parallel retrievers
Conditional chains
Tool routing
Multi-step agents
If retrieval was eager, you would lose this flexibility.
Lazy retrieval = Retrieval runs only when the chain is executed, not when it is defined.
Since you're preparing seriously for AI Engineer roles, this concept is important because:
Lazy execution is used in:
LangChain
Spark
Dask
TensorFlow graphs
SQL query planners
It’s a core distributed systems idea.