Before touching env/config:
🎯 Use-case clarity
Informational Q&A?
Fact extraction?
Comparative reasoning?
Multi-step decision support?
📏 Accuracy vs creativity
Strict grounding?
Allow synthesis?
⚠️ Legal & ethical
Scraping permissions
Robots.txt
PII handling
➡️ Output:
Clear scope of what the chatbot must and must not answer.
Define what talks to what.
Core layers (non-negotiable in industry):
Data Ingestion Layer
Knowledge Store Layer
Intelligence Layer
Orchestration / Agent Layer
Interface Layer (chat)
➡️ Decide early:
Stateless vs stateful chatbot
Single-domain vs multi-domain scraping
Human-in-loop or fully autonomous
Because every other component depends on it.
Industry practices:
Separate:
Runtime variables
Secrets
Feature flags
No hardcoding anywhere
Typical responsibility:
Model paths
Embedding model
Vector DB config
Chunk size & overlap
Agent permissions
➡️ Think of this as your “control room”
This is not just a config file.
It defines:
Scraping rules (depth, frequency, filters)
ETL policies
Embedding strategy
Retrieval strategy
Agent behavior
Industry trick:
Config should be editable without touching core logic.
➡️ If config changes break system → architecture is wrong.
In production, teams don’t “load models”, they register capabilities.
This layer defines:
LLM role (generator / planner / extractor)
Embedding model role
Tool access (search, summarize, re-rank)
Think:
“Which intelligence does this system have?”
➡️ Later, Agentic RAG depends heavily on this separation.
Scraping is NOT one step.
Sub-phases:
Discovery (what to scrape)
Fetching (HTML, PDFs, APIs)
Validation (is content usable?)
Versioning (content changes)
Key question:
“If the site changes tomorrow, will my pipeline break silently?”
➡️ Most failures happen here in real systems.
This is where 90% of RAG quality is decided.
Content cleaning
De-duplication
Semantic chunking
Metadata enrichment
Metadata is not optional:
Source
Timestamp
Domain
Content type
Authority level
➡️ Agentic RAG heavily uses metadata for reasoning.
Define embedding policy:
Chunk size rationale
Overlap logic
Semantic vs structural chunking
Re-embedding strategy when content updates
Industry insight:
Poor chunking cannot be fixed by a better model.
This is where Vanilla RAG ends and intelligence begins.
Define:
Retrieval type (similarity / hybrid)
Filtering via metadata
Re-ranking policy
Confidence thresholds
Ask:
“What happens when retrieval finds nothing relevant?”
➡️ Industrial systems plan for retrieval failure.
You must decide:
Answer generation vs information extraction
Citation required or not
Hallucination tolerance (usually zero)
Industry pattern:
LLM is the LAST step
Everything before it reduces uncertainty
This is added after Vanilla RAG is stable.
Agent responsibilities:
Decide when to retrieve
Decide which tool to use
Perform multi-step reasoning
Self-verify answers
Key design question:
“Is the agent allowed to scrape again?”
Industry rule:
Agents must be permission-bounded
Even local systems need:
Query logs
Retrieval accuracy tracking
Hallucination detection
Drift detection
Ask:
“How will I know this system got worse?”
Industrial systems assume failure.
Plan for:
Broken scraper
Empty vector store
Model crash
Partial answers
Fallbacks matter more than features.
Problem Framing
↓
System Architecture Contract
↓
Env & Secrets
↓
Config Brain
↓
Model & Tool Registry
↓
Data Acquisition
↓
ETL & Knowledge Normalization
↓
Embedding Strategy
↓
Retrieval System
↓
Vanilla RAG
↓
Agentic RAG
↓
Observability & Governance