Search⌘ K
AI Features

System Design: LLM-powered Customer Support Bot

Define the requirements and resource estimates for an LLM-powered customer support bot. Learn how functional requirements like RAG-based generation and context-aware dialogue combine with non-functional requirements for low latency and cost efficiency. This foundation prepares learners to design scalable, production-ready conversational AI systems.

Traditional rule-based chatbots fail when queries deviate from predefined scripts, leading to generic responses, poor user experience, and costly escalations.

An LLM-powered customer support bot addresses this gap by leveraging large language models to interpret intent, maintain conversational context across multiple exchanges, and generate responses that feel natural. Modern production architectures go further by combining LLMs with Retrieval-Augmented Generation (RAG)Retrieval-Augmented Generation (RAG) is an AI technique where a model retrieves relevant information from external data sources and uses it to generate more accurate responses., which grounds every response in company-specific knowledge bases and real-time data rather than relying on the model’s static training data. This dramatically reduces hallucinations and keeps answers accurate. In this chapter, we will design such a system.

This lesson establishes three things: functional requirements, non-functional requirements, and resource estimation for designing an LLM-powered customer support bot.

Let's start with the functional requirements.

Functional requirements

The following functional requirements define the system’s core behaviour:

  • Dialogue management: The system must maintain multi-turn conversation history so that follow-up questions like “What about the other item?” are interpreted correctly within the ongoing session, not treated as isolated queries.

  • Natural language understanding: The system must interpret user intent, extract key entities such as order IDs and product names, and disambiguate vague queries using contextual cues from the conversation.

  • Response generation: The system must fetch relevant documents from a company knowledge base or vector databaseA specialized database that stores data as high-dimensional numerical vectors (embeddings), enabling fast similarity search rather than exact keyword matching. and generate ...