Llama Stack: From Fundamentals to Deployment/

...

Building an Agent with RAG for Q&A

Learn how to create a Llama Stack agent that uses document retrieval via a vector database to answer user questions with grounded, context-aware responses.

We'll cover the following...

Why use RAG agents?
Creating a RAG-enabled agent

Previously, we set up a knowledge base by registering a vector database and ingesting documents into it using the RAG tool. This allowed us to search your data semantically. Now, it’s time to use that.

We’ll create an agent that uses the rag/knowledge_search tool to retrieve relevant chunks from our knowledge base at runtime, combine them with model reasoning, and generate a final response. This pattern is called retrieval-augmented generation, or RAG, and it’s one of the most powerful techniques for improving factual accuracy and grounding in LLM-powered applications.

Press + to interact

Getting Started with Llama Stack

Core Building Blocks: Architecture and Inference

Agents, Tools, and Retrieval with Llama Stack

Safety, Monitoring, and Evaluation

Advanced Integration and Beyond

Conclusion

Building an Agent with RAG for Q&A

Why use RAG agents?