Some text some message..
Back Matrices in RAG (Retrieval-Augmented Generation) 22 Aug, 2025

In RAG, matrices mainly appear in the embedding and retrieval stage, since embeddings are inherently stored and compared as vectors and matrices.


🔑 Where Matrices Come into Play in RAG

1. Embedding Matrix

  • Each text chunk (Document.page_content) is converted into a vector (e.g., 768-dim for BERT, 1536-dim for OpenAI embeddings).

  • When you store multiple vectors, they form an embedding matrix:

    • Shape: (#documents/chunks × embedding_dim)

    • Example: 10,000 chunks → matrix of size (10000 × 1536)

👉 This is the core mathematical structure that powers retrieval.


2. Similarity Calculation (Dot Products / Cosine Similarity)

  • Retrieval works by comparing a query vector (1 × embedding_dim) with the embedding matrix.

  • Mathematically:

    Similarity Scores=QDT\text{Similarity Scores} = Q \cdot D^T
    • Q = Query vector (1 × d)

    • D = Document matrix (n × d)

    • Result = similarity scores (1 × n)

👉 This gives you a score vector showing which documents are closest to the query.


3. Distance Matrices

  • Sometimes you calculate pairwise similarities between documents or between queries and documents.

  • This forms a similarity/distance matrix:

    • Shape: (n × n) for document-to-document

    • Shape: (m × n) for multiple queries vs. documents


4. Matrix Factorization (Optional in RAG Enhancements)

  • Advanced RAG setups may use:

    • SVD (Singular Value Decomposition)

    • PCA (Principal Component Analysis)

    • To reduce embedding dimensionality and make retrieval faster.

  • These rely on linear algebra operations on embedding matrices.


5. Attention Matrices (Inside LLM)

  • Once relevant docs are retrieved, they are fed into the LLM.

  • Inside the LLM, attention matrices determine how tokens (from query + retrieved docs) relate to each other.

  • While not part of retrieval, it’s still a matrix operation that helps generation.


🎯 In Summary

In RAG, matrices appear in two main ways:

  1. Embedding Matrix → Stores vector representations of chunks.

  2. Similarity Matrices → Used for retrieval via dot products / cosine similarity.

  3. Optional → Dimensionality reduction (PCA/SVD) and Attention matrices inside LLM.