Design of an LLM-Powered Customer Support Bot

We'll cover the following...

High-level design of an LLM-powered customer support bot
API design
Detailed design of an LLM-powered customer support bot
- Components in the detailed design of an LLM-powered customer support bot
- Workflow of the support bot system
Requirements compliance
Conclusion

In the previous lesson, we explored the requirements and resource estimates for an LLM-powered customer support bot. Building on that, we’ll move to the system’s high-level design.

High-level design of an LLM-powered customer support bot

Production support bots can become outdated quickly as product details and policies evolve, which can lead to inaccurate responses and lower user satisfaction. Systems must continuously incorporate up-to-date knowledge while managing LLM-related costs.

The following high-level design uses retrieval-augmented generation (RAG) with cost-aware routing to deliver accurate, context-rich responses. The workflow is as follows: a user submits a query through the web or mobile client. The request is processed by the API gateway for authentication, rate limiting, and session management before the gateway forwards it to the backend services. The RAG pipeline retrieves relevant knowledge from a vector database and augments the prompt, which is then passed to the LLM to generate a response. The response goes through content moderation before it is returned to the user, and the conversation is logged while feedback is collected.

This architecture ensures that the system references up-to-date product knowledge rather than relying solely on what the model learned during training.

Educative byte: Parametric memory refers to the knowledge baked into an LLM’s weights during training. It becomes stale the moment source documents are updated, which is why retrieval-augmented approaches are essential for production support bots.

With the high-level flow established, the next step is to design the system’s APIs.

API design

To support the functional requirements, we define a set of APIs that enable communication between system components and handle key operations within the LLM-powered customer support bot.

sendMessage(): Handles incoming user queries, maintains session context, and returns a response by orchestrating the NLU, RAG, and LLM components. It ensures multi-turn dialogue by attaching conversation history to each request.

Parameters	Description
`session_id`	A unique identifier for the conversation session
`user_id`	A unique identifier for the user
`message`	User’s input text query