LangGraph is a library from LangChain that helps you build stateful, multi-step applications using graph-based workflows.
Think of it like building a flowchart of how your AI app should behave — with clear paths for different logic.
LangGraph = LangChain + Graph + Memory + Control flow.
Here's a typical RAG pipeline using LangGraph:
Each node is a step in the RAG process.
Input Node – User question comes in.
Retriever Node – Use a retriever (e.g., FAISS, Chroma) to fetch relevant chunks from your knowledge base.
LLM Generation Node – Send the retrieved documents + question to the LLM for final answer.
Output Node – Return the answer to the user.
Optional nodes can include:
Condense question (for follow-ups)
Re-ranking of documents
Memory for previous chats
Validation of LLM outputs
You can define custom logic in how your RAG system works.
You can add retry mechanisms, loops, or condition-based routing (e.g., “If retrieval fails, ask a clarifying question”).
You can manage conversation memory, so it feels like a chatbot that remembers context.
[User Input]
↓
[Condense Question (optional)]
↓
[Retrieve Documents]
↓
[Generate Answer using LLM]
↓
[Return Response to User]
You define these steps in LangGraph using @tool
, @node
, and connect them using edges
.
from langgraph.graph import StateGraph
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
retriever = vectorstore.as_retriever()
def retrieve_docs(state):
return {"docs": retriever.get_relevant_documents(state["question"])}
def generate_answer(state):
llm = ChatOpenAI()
docs = state["docs"]
question = state["question"]
# Combine docs and question
prompt = f"Answer based on docs: {docs}\n\nQuestion: {question}"
answer = llm.invoke(prompt)
return {"answer": answer}
# Define graph
graph = StateGraph()
graph.add_node("retrieve", retrieve_docs)
graph.add_node("generate", generate_answer)
graph.set_entry_point("retrieve")
graph.add_edge("retrieve", "generate")
graph.set_finish_point("generate")
# Run graph
graph_executor = graph.compile()
result = graph_executor.invoke({"question": "What is RAG?"})
print(result["answer"])
Modular and reusable.
Great for production apps needing control flow and memory.
Makes debugging and testing easier.
Scales well with complex logic like agentic behavior.