Matrices in RAG (Retrieval-Augmented Generation)

Back Matrices in RAG (Retrieval-Augmented Generation) 22 Aug, 2025

ABHISHEK AGNIHOTRI

In RAG, matrices mainly appear in the embedding and retrieval stage, since embeddings are inherently stored and compared as vectors and matrices.

🔑 Where Matrices Come into Play in RAG

1. Embedding Matrix

Each text chunk (Document.page_content) is converted into a vector (e.g., 768-dim for BERT, 1536-dim for OpenAI embeddings).
When you store multiple vectors, they form an embedding matrix:
- Shape: (#documents/chunks × embedding_dim)
- Example: 10,000 chunks → matrix of size (10000 × 1536)

👉 This is the core mathematical structure that powers retrieval.

2. Similarity Calculation (Dot Products / Cosine Similarity)

Retrieval works by comparing a query vector (1 × embedding_dim) with the embedding matrix.
Mathematically:
$\text{Similarity Scores} = Q \cdot D^T$
- Q = Query vector (1 × d)
- D = Document matrix (n × d)
- Result = similarity scores (1 × n)

👉 This gives you a score vector showing which documents are closest to the query.

3. Distance Matrices

Sometimes you calculate pairwise similarities between documents or between queries and documents.
This forms a similarity/distance matrix:
- Shape: (n × n) for document-to-document
- Shape: (m × n) for multiple queries vs. documents

4. Matrix Factorization (Optional in RAG Enhancements)

Advanced RAG setups may use:
- SVD (Singular Value Decomposition)
- PCA (Principal Component Analysis)
- To reduce embedding dimensionality and make retrieval faster.
These rely on linear algebra operations on embedding matrices.

5. Attention Matrices (Inside LLM)

Once relevant docs are retrieved, they are fed into the LLM.
Inside the LLM, attention matrices determine how tokens (from query + retrieved docs) relate to each other.
While not part of retrieval, it’s still a matrix operation that helps generation.

🎯 In Summary

In RAG, matrices appear in two main ways:

Embedding Matrix → Stores vector representations of chunks.
Similarity Matrices → Used for retrieval via dot products / cosine similarity.
Optional → Dimensionality reduction (PCA/SVD) and Attention matrices inside LLM.