Search⌘ K
AI Features

Text Generation

Explore how to generate and control context-aware text with Hugging Face transformers. Learn to write prompts, adjust sampling parameters, and use models like GPT-2 and flan-T5 for tasks such as translation, summarization, and text generation in Python.

Text generation is one of the most exciting applications of modern NLP.

It allows machines to produce coherent, context-aware text, stories, summaries, translations, emails, and even code. Thanks to Transformer-based models like GPT-2, GPT-Neo, and flan-T5, among others, text generation has become extremely accessible in Python through the Hugging Face pipeline API.

In this lesson, you’ll explore how text generation works, how to control its behavior, and how to use different models for different tasks. By the end, you’ll be comfortable writing your own generation scripts and experimenting with creativity, coherence, and task-specific outputs.

What is text generation?

Text generation refers to producing new text based on a given input prompt.

The model analyzes the context and predicts the next most likely words, repeating this process until it completes the output. Modern generative models are trained on massive datasets, enabling them to produce human-like text that is fluent, creative, and contextually appropriate.

Transformers revolutionized this field by processing text in parallel and understanding word relationships using self-attention. Because of this, even small models, such as GPT-2, perform impressively on creative writing tasks, while larger, instruction-tuned models like flan-T5 excel at structured tasks, including summarization, translation, and Q&A.

Note: Text generation models do not “think”; they predict text based on patterns learned from large datasets.

Good prompts = better outputs.

How Hugging Face makes text generation easy

Hugging Face’s pipeline interface abstracts away tokenization, model loading, and output post-processing.

With just a few lines, you can create a generator that feels like using ChatGPT inside your Python script. Behind the scenes, the pipeline handles tokenization, batching, sampling strategies, padding, and decoding. This lets you focus on experimenting with prompts rather than engineering complexity.

1.

Which model should we use for story writing?

Show Answer
Did you find this helpful?

Basic text generation using GPT-2

Below is a starter example for generating text using a GPT-2 model.

from transformers import pipeline
# Create a text generation pipeline
generator = pipeline("text-generation", model="gpt2")
prompt = "Once upon a time in a futuristic city,"
output = generator(
prompt,
max_new_tokens=50, # use max_new_tokens instead of max_length
num_return_sequences=1,
pad_token_id=generator.tokenizer.eos_token_id # explicitly set pad token
)
print(output[0]['generated_text'])
Generating basic text generation using GPT-2

This code generates a continuation of your prompt. The max_new_tokens parameter ensures the model generates only the number of new words you want.

Fun fact: GPT-2 was once considered “too dangerous to release fully” when it first launched because of its surprisingly coherent text generation abilities!

Controlling the creativity and quality of generated text

Text generation can be more creative or more precise by adjusting sampling parameters. Temperature, top-k, and top-p allow you to tune the randomness and diversity of output.

output = generator(
prompt,
max_new_tokens=60,
temperature=0.8,
top_k=50,
top_p=0.9,
pad_token_id=generator.tokenizer.eos_token_id
)
print(output[0]['generated_text'])
Controlling the creativity & quality of generated text
  • temperature controls randomness:

    • Low temperature (0.2–0.5) = deterministic, factual

    • Higher temperature (0.8–1.0) = creative, risky

  • top-k and top-p restrict the selection pool to the most likely tokens, improving coherence. The former is a count-based technique, whereas the latter is a probability-based sampling technique. If top-k is 50, only the top 50 most likely tokens are considered for selection. If top-p is 0.9, it selects the smallest set of tokens whose cumulative probability reaches 90%

Note: If your model ever throws padding errors, always set pad_token_id, especially with GPT-style models.

1.

How do we improve output creativity?

Show Answer
Did you find this helpful?

Text-to-text generation with flan-T5

flan-T5 works differently from GPT-2. Instead of predicting next tokens, it is designed for instruction-following tasks such as summarization, translation, and Q&A. You simply phrase your request as a natural-language instruction.

flan_generator = pipeline("text2text-generation", model="google/flan-t5-base")
prompt = "Translate the following English sentence to French: 'Artificial Intelligence is transforming education.'"
output = flan_generator(
prompt,
max_new_tokens=50,
pad_token_id=flan_generator.tokenizer.eos_token_id
)
print(output[0]['generated_text'])
Text-to-text generation using flan-T5

This code uses a Flan-T5 text-to-text generation pipeline to perform a task—in this case, translating English to French. The model takes the prompt, generates up to 50 new tokens, and outputs the translated text.

Summarization example with flan-T5

flan-T5 excels at summarization because it is trained to follow natural language instructions. By simply prefixing your text with a task like “summarize:”, the model understands what to do and produces a concise, focused summary.

task_prompt = "summarize: Artificial intelligence (AI) is rapidly transforming many industries, from healthcare and education to finance and transportation. By analyzing vast amounts of data, AI systems can identify patterns, make predictions, and automate complex tasks. This technological revolution is driving innovation, improving efficiency, and creating new opportunities for businesses and society alike."
summary = flan_generator(
task_prompt,
max_new_tokens=20,
pad_token_id=flan_generator.tokenizer.eos_token_id
)
print(summary[0]['generated_text'])
Summarization using flan-T5

This code uses the flan-T5 text-to-text generation pipeline to generate a summary of the given text. It takes the task_prompt with a “summarize:” instruction, generates up to 20 tokens, and returns the condensed summary in generated_text.

1.

Which model is better for summarization or translation?

Show Answer
Did you find this helpful?

Where text generation is used today

Text generation powers numerous real-world systems, including chatbots, storytelling apps, code generation tools, summarizers, translation engines, email assistants, customer support bots, and more. With prompt engineering, the same model can perform multiple roles by simply changing the instructions.

Try it yourself

Now that you’ve explored how text generation works, it’s time to experiment!
We’ve added an interactive Jupyter notebook to this lesson, allowing you to run the exact code examples yourself, from basic text generation to controlled sampling and flan-T5 text-to-text tasks.

Add the token value you have already created in the Text and Token Classification lesson to the first cell of the Jupyter notebook, and then run all cells.

Please login to launch live app!

Why text generation sometimes stops mid-sentence

As you experiment with the Jupyter notebook, you might notice that the output of some prompts is incomplete, and the text generation sometimes stops abruptly in the middle of a sentence. This can be confusing, but it’s a common behavior when using transformer-based models.

There are a few reasons why this happens:

  1. Token limits: Models generate text token by token, and pipelines often impose a maximum number of new tokens (max_new_tokens) or a maximum total sequence length. If the output reaches this limit, generation stops, even if the sentence isn’t complete.

  2. Early stopping: Some models have built-in mechanisms to stop generation when they detect certain conditions, such as reaching an end-of-sequence token. Sometimes these tokens appear prematurely, causing truncated output.

  3. Sampling randomness: If you use temperature, top-k, or top-p sampling, the model might generate sequences that are shorter or end abruptly because the sampling procedure selects an end token earlier than expected.

How to address this problem

  • Increase max_new_tokens: Allow the model to generate more tokens so that sentences can complete naturally.

  • Adjust stopping criteria: Check early_stopping or eos_token_id parameters in the pipeline to ensure the model doesn’t stop too early.

  • Control sampling parameters: Lowering the temperature or adjusting top-k/top-p can reduce abrupt truncation while still maintaining coherent output.

  • Chunk long inputs: For very long articles or prompts, consider splitting the text into smaller sections and summarizing each separately.

By understanding these factors and adjusting the parameters, you can minimize incomplete outputs and achieve more reliable and coherent text generation results.

Summary

Text generation is the foundation of modern AI applications. Using models like GPT-2 and flan-T5 through Hugging Face’s pipeline makes it incredibly easy to build real text generators in Python. You learned how to write prompts, control generation quality, and apply different models to different tasks, all with just a few lines of code.