The Canonical LLM App Stack

Explore the canonical LLM app stack that organizes large language model applications into distinct layers for clear responsibilities and efficient workflows. Understand each layer from user interface and API gateway to orchestration, retrieval, embedding, model serving, and observability. This lesson helps you grasp how these components fit together to build maintainable, scalable LLM systems with examples and common tools.

We'll cover the following...

The application and API layers
The orchestration layer
- Core responsibilities
Retrieval, embedding, and model layers
- How retrieval and embedding work together
  - From retrieved chunks to generated response
Data flow and the observability layer
- End-to-end query walk-through
- The observability layer
Conclusion

In the previous lesson, you explored prompting strategies, retrieval-augmented generation, and fine-tuning as individual techniques for customizing LLM behavior. Each technique solves a specific problem, but none of them operates in isolation inside a real product. When an enterprise deploys an LLM-powered application, these techniques become components wired together across multiple layers of software. The question shifts from “which technique should I use?” to “how do all of these pieces fit together in a single, maintainable system?”

This is where the canonical LLM app stack comes in. Think of it as a reference blueprint, similar to how web development has the classic three-tier architecture (frontend, backend, database). The canonical LLM app stack is a layered reference architecture that enterprise teams use to structure end-to-end LLM applications. It provides consistency across teams, clearer ownership boundaries, easier debugging, and faster iteration. Without a shared architectural vocabulary, one team might embed retrieval logic inside the UI layer while another buries prompt templates in the model serving code, making the system nearly impossible to maintain.

To make this concrete, consider a scenario you will follow throughout this lesson. A company builds an internal document Q&A system. An employee types a question about the company’s travel reimbursement policy. The system retrieves the most relevant policy ...

1.LLM Application Architectures

2.Challenges and Risks

3.Transformers and Attention

4.Vector Databases

5.Prompt Engineering

Cloud Lab

6.Fine-Tuning

Cloud Lab

7.Model Context with LangChain

8.Agentic Workflows

Cloud Lab

9.Retrieval Augmented Generation (RAG)

Cloud Lab

Cloud Lab

10.LLM Evaluation

Cloud Lab

The Canonical LLM App Stack