Building with OpenAI: From APIs to Agents/

...

/

Text Generation and Conversation

Now that we’ve made our first successful API call, it’s time to unlock the real power of AI development. In this lesson, we’ll transform simple one-off interactions into sophisticated, contextual conversations that remember, adapt, and feel genuinely intelligent.

By the end of this lesson, you’ll understand how to build AI systems that don’t just respond, they converse, remember, and maintain consistent personalities across long interactions.

How to customize responses based on instructions

AI conversations aren’t just simple question-and-answer exchanges. They’re structured dialogues where context matters, roles are defined, and each message builds on what came before. Here are the available roles.

user: This is you, your questions, requests, and instructions.
assistant: This is the AI’s response.
system: This sets the AI’s personality, behavior, and rules.

Why don’t you go ahead and try different system messages and see how the AI’s personality changes?

“You are a pirate who answers questions in pirate speak.”
“You are a professional consultant who gives business advice.”

The same question about machine learning will get very different answers depending on the system message. This is one of the most powerful features of AI development; you can create entirely different experiences just by changing how you frame the conversation.

How to use conversation history for better responses

To make the AI remember what we’ve talked about, we need to include the entire conversation history:

Notice how the AI can reference our favorite color in its response about nature? That’s because we included the full conversation context.

You might wonder why we send the entire conversation history, all previous messages, every time we make an API call. This is because AI models (and the OpenAI API itself) are stateless, meaning they don’t remember anything from previous interactions on their own.

Think of it like talking to someone who completely forgets everything you said after each sentence: unless you remind them of the conversation so far, they won’t know what you’re referring to. That’s why, in our conversation array, we include the system message, the user’s mention of liking blue, the assistant’s response about blue being calming, and the new question about nature.

Without that full context, the AI wouldn’t know what “my favorite color” refers to in the final question. Each API call starts with a blank slate, so if we want the AI to “remember,” we must include all relevant message history in the input every time.

This approach gives us full control over what context the model sees, but it also means that we’re responsible for managing and maintaining that conversation history as our app runs.

How to change the AI’s behavior with parameters

The AI’s personality and response style can be dramatically changed with a few key parameters. Think of these as dials you can turn around to generate exactly the kind of response that you want.

Temperature controls how creative or predictable the AI is.

temperature=0: Highly predictable, same answer every time.
temperature=0.7: Balanced creativity (recommended for most uses).
temperature=1.0: Maximum creativity, unique answers every single time.

Python 3.10.4

# Conservative, factual response
print("Conservative,factual response:")
response_conservative = client.responses.create(
    model="gpt-4o",
    input="What is the name of the manga where we have a white haird character as the main protagonist? Only provide two names!",
    temperature=0.1
)
print("-" * 100)
print(response_conservative.output_text)
print("-" * 100)
print("Creative, varied response:")
print("-" * 100)
# Creative, varied response
response_creative = client.responses.create(
    model="gpt-4o",
    input="What is the name of the manga where we have a white haird character as the main protagonist? Only provide two names!",
    temperature=0.9
)
print(response_creative.output_text)
print("-" * 100)

Note: You might have noticed that when we started talking about temperature and max_output_tokens, we suddenly switched from GPT-5 to GPT-4o. That’s because GPT-5 is fundamentally different; it’s a reasoning model that works with entirely different parameters.

While GPT-4o and earlier models use temperature to control creativity and max_output_tokens to control length, GPT-5 introduces two revolutionary parameters: reasoning and verbosity.

Models in the GPT-5 family only support a temperature of 1.0. Attempting to use other values will result in an error, as the temperature parameter is effectively hardcoded to 1.0 to ensure consistent and high-quality outputs.

How to control output while using GPT-5

GPT-5’s reasoning parameter controls how much computational “thinking” the model does before responding. You can set it to minimal (fastest, for simple tasks), low, medium (default), or high (most thorough reasoning). The verbosity parameter replaces max_output_tokens and controls the natural length and depth of responses with three settings: low (concise), medium (balanced), and high (comprehensive). These parameters work together to give you precise control over both the quality of reasoning and the style of output.

Unlike temperature, which affects randomness, reasoning affects the quality and depth of logical processing. Unlike max_output_tokens , which just halts responses, verbosity naturally scales the completeness and detail level while maintaining coherence. This makes GPT-5 perfect for tasks requiring careful analysis, complex problem-solving, and nuanced explanations.

Let’s take a look at an example of how reasoning works:

Python 3.10.4

"""
GPT-5 Reasoning Effort Demo
===========================
This script demonstrates how the reasoning_effort parameter controls 
the quality and depth of GPT-5's thinking process, and measures the 
performance impact of different reasoning levels.
"""
import time
# Set up OpenAI client
# Make sure your OPENAI_API_KEY environment variable is set
def demo_reasoning_effort():
    """
    Demonstrates the reasoning_effort parameter.
    
    reasoning_effort controls how much computational "thinking" the model does:
    - minimal: Fastest, for simple deterministic tasks
    - low: Light reasoning
    - medium: Default balanced reasoning
    - high: Most thorough reasoning and analysis
    """
    print("=" * 60)
    print("GPT-5 REASONING EFFORT DEMONSTRATION")
    print("=" * 60)
    
    # Test prompt that benefits from reasoning
    problem = "If a train leaves Station A at 2:00 PM traveling at 60 mph, and another train leaves Station B at 2:30 PM traveling at 80 mph toward Station A, and the stations are 200 miles apart, when will they meet?"
    
    print("Problem:", problem)
    print("\n" + "-" * 60)
    
    # Minimal reasoning effort - fast but simple
    print("MINIMAL REASONING EFFORT (fastest):")
    print("-" * 40)
    start_time = time.time()
    response_minimal = client.responses.create(
        model="gpt-5",
        input=problem,
        reasoning={"effort": "minimal"}
    )
    minimal_time = time.time() - start_time
    print(response_minimal.output_text)
    print(f"\n⏱️  Time taken: {minimal_time:.2f} seconds")
    
    print("\n" + "-" * 60)
    
    # High reasoning effort - thorough analysis
    print("HIGH REASONING EFFORT (most thorough):")
    print("-" * 40)
    start_time = time.time()
    response_high = client.responses.create(
        model="gpt-5", 
        input=problem,
        reasoning={"effort": "high"}
    )
    high_time = time.time() - start_time
    print(response_high.output_text)
    print(f"\n⏱️  Time taken: {high_time:.2f} seconds")
    print(f"📊 Speed difference: {high_time/minimal_time:.1f}x slower than minimal")
    
    print("\n" + "=" * 60)
if __name__ == "__main__":
    """
    Run the reasoning effort demonstration.
    
    This shows how different reasoning levels affect both the quality
    of responses and the time taken to generate them.
    """
    
    print("GPT-5 Reasoning Effort Demo")
    print("This demo shows how reasoning_effort controls the depth of GPT-5's thinking.\n")
    
    try:
        demo_reasoning_effort()
        
    except Exception as e:
        print(f"Error: {e}")
        print("Make sure your OPENAI_API_KEY environment variable is set correctly.")

When we run the code, we can see two things.

Line 39: When we set reasoning={"effort": "minimal"}, then GPT-5 skips most of its internal reasoning process and gives us a quick, direct answer.
Line 54: When we set reasoning={"effort": "high"}, then GPT-5 engages in extensive internal reasoning and works through the problem step-by-step before responding.

This demonstrates that reasoning effort isn’t just about response length, but it’s about the quality of logical processing that happens before GPT-5 even starts writing its response. Let’s also take a look at how verbosity affects user output:

Python 3.10.4

"""
GPT-5 Verbosity Demo
====================
This script demonstrates how the verbosity parameter controls 
the length and depth of GPT-5's responses, and measures the 
performance impact of different verbosity levels.
"""
import time
def demo_verbosity():
    """
    Demonstrates the verbosity parameter.
    
    verbosity controls the natural length and depth of responses:
    - low: Concise, minimal responses
    - medium: Default balanced responses  
    - high: Comprehensive, detailed responses
    """
    print("=" * 60)
    print("GPT-5 VERBOSITY DEMONSTRATION")
    print("=" * 60)
    
    # Topic that can be explained at different depths
    topic = "Explain how machine learning algorithms learn from data"
    
    print("Topic:", topic)
    print("\n" + "-" * 60)
    
    # Low verbosity - concise explanation
    print("LOW VERBOSITY (concise):")
    print("-" * 40)
    start_time = time.time()
    response_low = client.responses.create(
        model="gpt-5",
        input=topic,
        text={"verbosity": "low"}
    )
    low_time = time.time() - start_time
    print(response_low.output_text)
    print(f"\n⏱️  Time taken: {low_time:.2f} seconds")
    
    print("\n" + "-" * 60)
    
    # Medium verbosity - comprehensive explanation
    print("MEDIUM VERBOSITY (comprehensive):")
    print("-" * 40)
    start_time = time.time()
    response_medium = client.responses.create(
        model="gpt-5",
        input=topic, 
        text={"verbosity": "medium"}
    )
    medium_time = time.time() - start_time
    print(response_medium.output_text)
    print(f"\n⏱️  Time taken: {medium_time:.2f} seconds")
    print(f"📊 Length difference: {medium_time/low_time:.1f}x longer than low verbosity")
    
    print("\n" + "=" * 60)
if __name__ == "__main__":
    """
    Run the verbosity demonstration.
    
    This shows how different verbosity levels affect both the length
    of responses and the time taken to generate them.
    """
    
    print("GPT-5 Verbosity Demo")
    print("This demo shows how verbosity controls the length and depth of GPT-5's responses.\n")
    
    try:
        demo_verbosity()
        
    except Exception as e:
        print(f"Error: {e}")
        print("Make sure your OPENAI_API_KEY environment variable is set correctly.")

When we run the code, we can see two important differences.

Line 38: When we set text={"verbosity": "low"}, GPT-5 provides a concise, to-the-point explanation that covers the essential concepts without elaboration.
Line 53: When we set text={"verbosity": "medium"}, GPT-5 gives a comprehensive, detailed explanation with examples, deeper context, and more thorough coverage of the topic.

This demonstrates that verbosity isn’t about limiting responses like max_tokens would. Instead, it naturally controls how much detail and depth GPT-5 includes while maintaining completeness and coherence. The model automatically adjusts its explanation style to match the requested verbosity level, giving you exactly the right amount of information for your use case.