Training Infrastructure of a Text-to-Text Generation System

Learn to design, train, and evaluate text-to-text LLMs, focusing on requirements, data, distributed training, and performance metrics.

Text-to-text LLMs are a subset of language models. Unlike their predecessors, which were primarily designed for text generation or translation tasks, conversational LLMs are specifically trained to engage in interactive dialogue. They can understand user input and generate human-like responses, making them ideal for applications like chatbots, virtual assistants, and interactive storytelling.

These are the brains behind those friendly AI assistants you interact with on websites or your smartphone. They are designed to understand your needs (even if you phrase them roundaboutly) and provide helpful, informative, and often entertaining responses.

Press + to interact
 A snapshot of what conversational text-to-text generation systems look like
A snapshot of what conversational text-to-text generation systems look like

Let’s see how we can design our own conversational AI. The first step is defining the requirements to guide the design process.

Requirements

Building the backend for a robust conversational AI system requires careful consideration of both functional and nonfunctional requirements.

Functional requirements

  • Natural language understanding: The system must decipher the meaning behind user input, including identifying intentIntent refers to the purpose or goal behind a user's query (e.g., asking for information or making a request)., entitiesEntities are specific pieces of information extracted from the input, such as names, locations, or dates., and sentimentSentiment is the emotional tone or attitude conveyed in the input, which can range from positive to negative or neutral. Recognizing sentiment enables the system to tailor responses appropriately.. Imagine asking your AI assistant, “What’s the weather like in London tomorrow?” The system needs to understand that you’re asking about the weather (intent), that “London” is the location (entity), and “tomorrow” is the time (entity).

Press + to interact
Natural language understanding of a query
Natural language understanding of a query

We can also look at an example of sentiment in a query. For instance, if the user says, “I’m so excited about the sunny weather tomorrow in London!” the system should extract:

    • Intent: The user is expressing enthusiasm about the weather.

    • Entities: London is the location, and tomorrow is the time.

    • Sentiment: The user’s sentiment is positive, as their excitement shows.

  • Dialogue management: The system must effectively manage conversations by retaining relevant information from previous interactions (context retentionContext retention refers to the ability of the system to store and recall relevant details from earlier in the conversation, such as user preferences, prior topics discussed, or incomplete tasks, to provide coherent and personalized responses.) and maintaining an awareness of the conversation’s progress (state managementState management is the process of tracking the current state of the dialogue, including the conversation's flow, user intents, and unresolved queries, to ensure logical progression and appropriate responses.). This includes keeping track of user preferences, remembering recent topics, and understanding when to revisit or conclude a topic based on the conversation’s flow.

  • Natural language generation: Once the system (LLM) understands the user’s input and the conversation context, it needs to respond accurately to the query.

  • Personalization: The system should also be capable of tailoring responses based on user preferences and historical interaction. 

Press + to interact
How personalized LLMs respond to the same question
How personalized LLMs respond to the same question

Modern conversational bots now include the ability to tailor their responses to each user. For example, we can tell Gemini to remember that our name is ABC, and it will remember that whenever we chat. We will see how LLMs can maintain memory in the next lesson.

Nonfunctional requirements

  • Low latency: The system should be optimized to minimize latency and provide a seamless conversational experience.

Note: There can be trade-offs between latency and accuracy. For instance, achieving faster responses might mean sacrificing some degree of accuracy, as complex computations or larger models may require more processing time. Balancing latency and accuracy is essential, especially in applications where real-time interaction is critical, yet the accuracy of information remains important.

  • Scalability: As the user base grows, the system needs to handle the increased demand without compromising performance. This means efficiently processing a large volume of requests concurrently.

  • Availability: The text generation model should be accessible and operational whenever users need it. This means minimizing downtime and ensuring consistent uptime.

  • Reliability: The model should give dependable and legitimate responses.

  • Security: Protecting user data and ensuring privacy is paramount. User data typically includes personally identifiable information (PII) and, importantly, the user’s inputs to the system. Strong security measures must be implemented to safeguard this sensitive information.

Additionally, preventing prompt jailbreakingJailbreaking is where users exploit vulnerabilities to bypass safeguards and misuse the system. is an emerging challenge. Such incidents in models like ChatGPT and Gemini highlight the need for ongoing research and robust defenses against exploitation. We are trying to handle it in our system by ensuring ample training and cleaning data, as we will see later in the lesson, but of course, this is an inherent imperfection with generative models that can generate new data that cannot be predicted with certainty.

Note: User inputs may also be used as training data for the model. Transparency about such practices is critical to maintaining user trust and complying with ethical and legal standards.

With our requirements decided, we can now discuss how to pick a GenAI model that can fulfill our system’s needs.

Model selection

Building a conversational AI requires careful selection of the base language model, balancing capabilities with efficiency and cost-effectiveness. ...

Get hands-on with 1400+ tech skills courses.