Search⌘ K
AI Features

The Canonical LLM App Stack

Explore the canonical LLM app stack that organizes large language model applications into distinct layers for clear responsibilities and efficient workflows. Understand each layer from user interface and API gateway to orchestration, retrieval, embedding, model serving, and observability. This lesson helps you grasp how these components fit together to build maintainable, scalable LLM systems with examples and common tools.

In the previous lesson, you explored prompting strategies, retrieval-augmented generation, and fine-tuning as individual techniques for customizing LLM behavior. Each technique solves a specific problem, but none of them operates in isolation inside a real product. When an enterprise deploys an LLM-powered application, these techniques become components wired together across multiple layers of software. The question shifts from “which technique should I use?” to “how do all of these pieces fit together in a single, maintainable system?”

This is where the canonical LLM app stack comes in. Think of it as a reference blueprint, similar to how web development has the classic three-tier architecture (frontend, backend, database). The canonical LLM app stack is a layered reference architecture that enterprise teams use to structure end-to-end LLM applications. It provides consistency across teams, clearer ownership boundaries, easier debugging, and faster iteration. Without a shared architectural vocabulary, one team might embed retrieval logic inside the UI layer while another buries prompt templates in the model serving code, making the system nearly impossible to maintain.

To make this concrete, consider a scenario you will follow throughout this lesson. A company builds an internal document Q&A system. An employee types a question about the company’s travel reimbursement policy. The system retrieves the most relevant policy ...