System Design: LLM-Powered Customer Support Bot

We'll cover the following...

Functional requirements
Nonfunctional requirements
Resource estimation
Building blocks
Conclusion

Traditional rule-based chatbots break down when user input falls outside predefined scripts, often returning fallback or nonspecific responses. This degrades the user experience and increases reliance on human escalation, which raises operational costs.

An LLM-powered customer support bot addresses this gap by leveraging large language models to interpret intent, maintain conversational context across multiple exchanges, and generate natural responses. Modern production architectures go further by combining LLMs with retrieval-augmented generation (RAG)Retrieval-augmented generation (RAG) is an AI technique where a model retrieves relevant information from an external data source and uses it to generate more accurate responses., which grounds every response in company-specific knowledge bases and real-time data rather than relying on the model’s static training data. This significantly reduces hallucination rates and improves response accuracy. In this chapter, we design such a system.

This lesson establishes three things: functional requirements, nonfunctional requirements, and resource estimation for designing an LLM-powered customer support bot.

Let’s start with the functional requirements.

Functional requirements

The following functional requirements define the system’s core behavior:

Dialogue management: The system must maintain a multi-turn conversation history so that follow-up questions like “What about the other item?” are interpreted correctly within the ongoing session rather than treated as isolated queries.
Natural language understanding: The system must interpret user intent, extract key entities such as order IDs and product names, and disambiguate vague queries using contextual cues from the conversation.
Response generation: The system must fetch relevant documents from a company’s knowledge base or ...