Text Generation and Conversation
Explore how to create advanced AI conversations using OpenAI's APIs. Learn to manage conversation context, customize AI personalities, and control output by adjusting parameters like temperature and verbosity. Understand how to build responsive, coherent dialogue systems that remember and adapt dynamically.
Now that we’ve made our first successful API call, it’s time to unlock the real power of AI development. In this lesson, we’ll transform simple one-off interactions into sophisticated, contextual conversations that remember, adapt, and feel genuinely intelligent.
By the end of this lesson, you’ll understand how to build AI systems that don’t just respond, they converse, remember, and maintain consistent personalities across long interactions.
How to customize responses based on instructions
AI conversations aren’t just simple question-and-answer exchanges. They’re structured dialogues where context matters, roles are defined, and each message builds on what came before. Here are the available roles.
user: This is you, your questions, requests, and instructions.assistant: This is the AI’s response.system: This sets the AI’s personality, behavior, and rules.
Think of the system role as giving the AI its job description.
Here’s how it works:
Why don’t you go ahead and try different system messages and see how the AI’s personality changes?
“You are a pirate who answers questions in pirate speak.”
“You are a professional consultant who gives business advice.”
The same question about machine learning will get very different answers depending on the system message. This is one of the most powerful features of AI development; you can create entirely different experiences just by changing how you frame the conversation.
How to use conversation history for better responses
To make the AI remember what we’ve talked about, we need to include the entire conversation history:
Notice how the AI can reference our favorite color in its response about nature? That’s because we included the full conversation context.
You might wonder why we send the entire conversation history, all previous messages, every time we make an API call. This is because AI models (and the OpenAI API itself) are stateless, meaning they don’t remember anything from previous interactions on their own.
Think of it like talking to someone who completely forgets everything you said after each sentence: unless you remind them of the conversation so far, they won’t know what you’re referring to. That’s why, in our conversation array, we include the system message, the user’s mention of liking blue, the assistant’s response about blue being calming, and the new question about nature.
Without that full context, the AI wouldn’t know what “my favorite color” refers to in the final question. Each API call starts with a blank slate, so if we want the AI to “remember,” we must include all relevant message history in the input every time.
This approach gives us full control over what context the model sees, but it also means that we’re responsible for managing and maintaining that conversation history as our app runs.
How to change the AI’s behavior with parameters
The AI’s personality and response style can be dramatically changed with a few key parameters. Think of these as dials you can turn around to generate exactly the kind of response that you want.
Temperature controls how creative or predictable the AI is.
temperature=0: Highly predictable, same answer every time.temperature=0.7: Balanced creativity (recommended for most uses).temperature=1.0: Maximum creativity, unique answers every single time.
If we do multiple runs, we can notice how the responses change based on temperature. Low temperature gives reliable, consistent answers, perfect for factual questions. Two names always appear again and again. High temperature gives creative, varied responses, great for brainstorming or creative writing. There is most likely a different duo as the output on every other run.
We can also use max_output_tokens parameter to control how long the AI’s response can be:
Note: You might have noticed that when we started talking about temperature and max_output_tokens, we suddenly switched from GPT-5 to GPT-4o. That’s because GPT-5 is fundamentally different; it’s a reasoning model that works with entirely different parameters.
While GPT-4o and earlier models use temperature to control creativity and max_output_tokens to control length, GPT-5 introduces two revolutionary parameters: reasoning and verbosity.
Models in the GPT-5 family only support a temperature of 1.0. Attempting to use other values will result in an error, as the temperature parameter is effectively hardcoded to 1.0 to ensure consistent and high-quality outputs.
How to control output while using GPT-5
GPT-5’s reasoning parameter controls how much computational “thinking” the model does before responding. You can set it to minimal (fastest, for simple tasks), low, medium (default), or high (most thorough reasoning). The verbosity parameter replaces max_output_tokens and controls the natural length and depth of responses with three settings: low (concise), medium (balanced), and high (comprehensive). These parameters work together to give you precise control over both the quality of reasoning and the style of output.
Unlike temperature, which affects randomness, reasoning affects the quality and depth of logical processing. Unlike max_output_tokens , which just halts responses, verbosity naturally scales the completeness and detail level while maintaining coherence. This makes GPT-5 perfect for tasks requiring careful analysis, complex problem-solving, and nuanced explanations.
Let’s take a look at an example of how reasoning works:
When we run the code, we can see two things.
Line 39: When we set
reasoning={"effort": "minimal"}, then GPT-5 skips most of its internal reasoning process and gives us a quick, direct answer.Line 54: When we set r
easoning={"effort": "high"}, then GPT-5 engages in extensive internal reasoning and works through the problem step-by-step before responding.
This demonstrates that reasoning effort isn’t just about response length, but it’s about the quality of logical processing that happens before GPT-5 even starts writing its response. Let’s also take a look at how verbosity affects user output:
When we run the code, we can see two important differences.
Line 38: When we set
text={"verbosity": "low"}, GPT-5 provides a concise, to-the-point explanation that covers the essential concepts without elaboration.Line 53: When we set
text={"verbosity": "medium"}, GPT-5 gives a comprehensive, detailed explanation with examples, deeper context, and more thorough coverage of the topic.
This demonstrates that verbosity isn’t about limiting responses like max_tokens would. Instead, it naturally controls how much detail and depth GPT-5 includes while maintaining completeness and coherence. The model automatically adjusts its explanation style to match the requested verbosity level, giving you exactly the right amount of information for your use case.