Search⌘ K

Expanding Knowledge with Retrieval-Augmented Generation (RAG)

Learn about retrieval-augmented generation (RAG), a technique that enables an LLM to access private or up-to-date information.

We know that a large language model, for all its brilliance, has two fundamental limitations: its knowledge is frozen in time, and it is ignorant of your private, specific data. By default, any question we ask an LLM is like a “closed-book exam.” It must answer from memory. This is the primary source of hallucinations and its inability to answer questions about recent events or proprietary information.

What if we could change the rules of the exam? This is the elegant idea behind retrieval-augmented generation (RAG), the single most important pattern in modern LLM application development. Instead of asking the model to answer from memory, we give it access to new data, like in an “open-book exam.” The tool we use to find the right page in the book at the right time is precisely the engine we just explored in our last lesson: semantic search powered by embeddings.

In this lesson, we will explore the complete RAG pipeline, from preparing our library of documents to retrieving the perfect piece of context for any user query.

Retrieval-augmented generation

The core idea is simple and elegant. When a user asks a question, we first retrieve the most relevant information from a trusted knowledge source and then augment the prompt by providing that information to the LLM as context. We then ask the model to answer the question based only on the provided text.

This approach has several massive ...