MCP: Model Context Protocol, which is an emerging concept in AI infrastructure, particularly in vector databases and retrieval-augmented generation (RAG) systems.
Model Context Protocol (MCP) is a standardized protocol designed to define and manage the context that is sent to a language model or AI model during inference (prompting). It formalizes the way external tools, like vector databases or memory systems, interact with LLMs (Large Language Models).
Think of it as an API contract or schema that helps systems decide:
What information is relevant to retrieve?
How should that information be structured?
What metadata and source context should be included?
How should context be ranked, scored, or pruned before sending it to the model?
LLMs have token limits (context windows), and injecting irrelevant or redundant data can:
Decrease accuracy
Increase costs
Waste tokens
Confuse the model
MCP aims to optimize the context pipeline by standardizing the interface between:
Vector search tools (e.g., Weaviate, Pinecone, Qdrant)
Memory modules
Prompt templates
LLMs (OpenAI, Claude, Mistral, etc.)
Component | Description |
---|---|
Context Block | A unit of retrievable or generated knowledge (e.g., a paragraph, code snippet) |
Metadata | Info like source, timestamp, type (text, image, code), trust score |
Scoring/Ranking | Mechanisms for relevance (semantic similarity, recency, etc.) |
Token Budgeting | Allocating a fixed number of tokens to the most relevant blocks |
Compression | Transforming content to fit more knowledge in fewer tokens |
Formatting | Structuring retrieved context (markdown, JSON, citations, etc.) |
Filtering & Deduping | Removing redundant or irrelevant content before passing to the model |
[User Query] → [Embedder] → [Vector DB Search] →
[Model Context Protocol (MCP)] → [Prompt Template] → [LLM]
The MCP layer:
Prepares and filters the results from the DB
Packages them cleanly (e.g., “Top 5 answers with source and timestamp”)
Ensures alignment with the prompt and context window
Use case: Building a chatbot with enterprise docs
📝 You have hundreds of documents (HR, Legal, IT)
🧠 They are embedded and stored in a vector DB
💬 When a user asks a question:
The system runs a semantic search
MCP determines the top chunks to return
Filters old duplicates
Structures the selected context for input into the LLM
LLM responds with the most informed answer
🎯 Improved response relevance
🧠 Token-efficient context usage
🔌 Easier integration with multiple tools and databases
📦 Structured, auditable context delivery
🔍 Fine-grained control over memory, search, and formatting