Scoping and Planning a Production RAG Application
Discover how to apply the 4D LLMOps lifecycle to build a reliable, scalable Retrieval-Augmented Generation (RAG) HR assistant. Learn to scope requirements, select simple tools like FastAPI and PostgreSQL for vector storage, and map engineering tasks to quality gates. Understand how to transform raw markdown documentation into structured data, implement ingestion and inference pipelines, and prepare for containerized deployment. This lesson sets the foundation for building and testing a production RAG system.
We have spent the last few lessons building a working model of LLMOps: the 4D life cycle and the reference architecture for production RAG systems. Now we need to apply those ideas to a concrete build. The goal of this course is to ship a RAG application that is reliable, measurable, and maintainable. That requires a small set of tools with clear responsibilities, plus explicit quality gates we can test before deployment.
The RAG application we will build is an HR support assistant for a fictional company. We will build upon this throughout the remainder of the course. We also map the 4D framework directly to the engineering work we will execute.
This will provide a consistent framework for determining what to build next, measuring progress, and identifying when a phase is complete.
An HR support assistant
Assume we are the engineering team at a fictional company called Halluli that does not have a dedicated HR department. All company policies and processes are documented. New hires currently have to search through hundreds of pages of Markdown documentation ...