RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation

Definition

A technique that augments LLM generation by retrieving relevant context from a vector store before generating responses. For local LLMs with limited context windows, RAG enables processing of large codebases by storing vectors in a vector store, allowing the AI to understand code meaning without exceeding context limits.

Examples in the Wild

  • Example 1:Storing millions of lines of code as vectors to enable semantic search
  • Example 2:Retrieving relevant code snippets before generating fixes
  • Example 3:Loading large repositories without blowing up context window