Search⌘ K
AI Features

What Is Generative AI?

Explore the fundamentals of generative AI, distinguishing it from traditional machine learning. Understand core model architectures like Transformers, diffusion models, and GANs, and learn how large language models gain advanced capabilities. This lesson guides you in mapping business problems to AI solutions, configuring inference parameters, and recognizing risks such as hallucinations and bias to ensure responsible AI deployment in production environments.

Generative AI changes how many software systems produce output. Unlike traditional ML systems that classify inputs or predict labels, generative models generate new content, such as text, images, audio, video, and code, based on patterns learned from large training datasets. For developers and solutions architects on AWS, this changes design decisions across application layers, from customer-facing interfaces to internal automation pipelines.

Traditional machine learning, often called discriminative AI, draws decision boundaries. A discriminative model trained on support tickets learns to assign a label such as “billing” or “technical” to each incoming request. Given the same support ticket, a generative model drafts a customer-facing response. The discriminative model predicts a category; the generative model produces content. This distinction shapes many architectural choices you’ll make in this course.

Before going further, a handful of terms will appear throughout every remaining lesson. A foundation modelA large AI model pre-trained on broad data that can be adapted to a wide range of downstream tasks without task-specific training from scratch. serves as the starting point. Pre-training is the initial phase where the model learns general patterns from data. Inference is the process of sending input to a trained model and receiving output. A prompt is the input you provide, and a completion is the model’s generated response.

Note: Amazon Bedrock provides managed access to foundation models from AWS and third-party providers. The next lesson covers Bedrock in more detail. You do not need to provision, host, or operate the model infrastructure yourself.

The diagram below illustrates how generative AI works:

How generative AI works
How generative AI works

This lesson covers five objectives: distinguishing generative AI from discriminative AI, identifying core generative architectures, explaining how large language models gain capabilities through scale, mapping business problems to use case categories, and articulating the risks that shape governance decisions.

Core generative architectures

Three model families power the majority of today’s generative AI systems. Each operates on a different principle and targets different content modalities, so understanding their mechanics, even at a high level, is essential for selecting the right model for a given workload.

Transformers

The TransformerA neural network architecture that uses a self-attention mechanism to weigh the relevance of every element in an input sequence against every other element, enabling it to capture long-range dependencies efficiently. architecture is the backbone of modern large language models. At a high level, self-attention allows the model to evaluate the importance of each word (or token) in a sequence relative to every other token. Think of it like reading a long legal contract where the meaning of a clause on page twelve depends on a definition introduced on page one. Self-attention lets the model make that connection directly. Models in the Claude family, available on Amazon Bedrock, are built on Transformer architectures.

Diffusion models and GANs

Diffusion models work through a two-phase process. During training, the forward phase gradually adds random noise to an image until it becomes pure static. The model then learns the reverse process, removing noise step by step to reconstruct, or generate, a coherent image from randomness. When guided by a text prompt, diffusion models produce images that match the description. This architecture powers image- and video-generation services.

Generative adversarial networks (GANs) use a competitive setup. The generator produces synthetic samples that resemble the training data, while the discriminator aims to distinguish real from generated samples. Training improves both networks: the generator learns to produce more realistic samples, and the discriminator learns to detect generated samples. While GANs were historically dominant for image synthesis, Transformers and diffusion models have largely overtaken them for new production workloads due to greater stability and output quality. Architecture choice depends on the target modality and use case.

The following table summarizes the trade-offs:

Comparison of Generative AI Architectures

Architecture

Primary Modality

How It Generates

Typical Use Cases

Transformers

Text and code

Next-token prediction using self-attention over sequences

LLM chat, code generation, summarization, translation

Diffusion Models

Images and video

Iterative denoising from random noise guided by a text prompt

Image generation, image editing, video synthesis

GANs

Images

Generator-discriminator adversarial training loop

Style transfer, data augmentation, super-resolution (largely superseded by diffusion models)

Now that you’ve seen the main model families, the next section looks more closely at the model families most commonly used in production AI applications.

Large language models and scale

LLMs are trained using self-supervised pre-trainingA training approach where the model learns from unlabeled data by predicting parts of the input (such as the next token in a sequence), eliminating the need for manually annotated datasets. on massive text corpora. The training objective is deceptively simple: predict the next token in a sequence. Applied at enormous scale, with billions of parameters trained on terabytes of text, this objective produces emergent capabilities. These are abilities not explicitly programmed but that arise from scale, including instruction following, multi-step reasoning, summarization, and code generation.

Context window and why it matters

The context windowThe maximum number of tokens a model can process in a single prompt-plus-completion cycle, determining how much information the model can consider at once. defines how much text the model can “see” at once. A larger context window allows the model to reason over longer documents, maintain coherent multi-turn conversations, and incorporate more retrieved context in retrieval-augmented generation (RAG) patterns. When selecting a foundation model on Amazon Bedrock, context window size is a primary evaluation criterion alongside cost and latency.

Inference parameters

Practitioners control output behavior through several key parameters:

  • Temperature governs the randomness of the model’s output, where lower values produce more deterministic responses and higher values increase creativity.

  • Top_p (nucleus sampling) sets a cumulative probability threshold so the model only considers the most likely tokens, filtering out low-probability noise.

  • Max tokens caps the length of the generated completion, directly affecting both response quality and inference cost.

These parameters are configurable through Amazon Bedrock’s API. Choosing the right combination is a practical skill. A customer-facing FAQ bot benefits from low temperature and constrained max tokens, while a creative writing assistant may use higher temperature and a generous token budget.

Practical tip: Start with a temperature of 0.2–0.3 for factual tasks and increase only when the use case explicitly requires creative variation. This single adjustment often has the largest impact on output quality.

The following markmap organizes the business problems that LLMs and other generative models address:

This taxonomy maps common business problems to generative AI capability categories for selecting the right model type and deployment strategy

Business value and multimodal AI

Mapping a business problem to the correct use case category is the first step in any generative AI architecture. A summarization requirement points toward an LLM. Generating product images from text descriptions requires a diffusion model. Getting this mapping wrong leads to wasted effort and suboptimal results.

Multimodal generative AI extends this further with models that accept and produce multiple modalities, including text, images, and video, within a single model. Consider a practical scenario: a multimodal model analyzes a product photograph and generates a marketing description, or accepts a chart image and answers questions about the data it contains. This eliminates the need to chain separate single-modality models together, simplifying the overall architecture and reducing latency.

Amazon Bedrock provides access to multimodal foundation models. When deciding how to customize model behavior, the order of preference matters significantly. Prompt engineering should be the first approach tried because it requires no additional infrastructure and delivers results in minutes. If the model needs access to domain-specific knowledge, RAG via Bedrock Knowledge Bases augments prompts with retrieved context without modifying the model itself. Fine-tuning should be reserved for cases where neither prompting nor RAG achieves the required performance, because it adds complexity, cost, and ongoing maintenance burden.

Attention: A common and expensive mistake is jumping directly to fine-tuning when prompt engineering or RAG would have solved the problem. Always validate simpler approaches first.

The following quiz tests your understanding of these architectural decisions:

Lesson Quiz

1.

A solutions architect needs a model that can accept a product image and generate a text description. Which capability is required?

A.

Discriminative classification

B.

Multimodal generative AI

C.

Generative adversarial network

D.

Hyperparameter optimization


1 / 2

Understanding what generative AI can do is only half the picture. The risks it introduces are equally important for production systems.

Limitations and risks

Every generative AI deployment must account for four categories of risk that directly affect architectural and governance decisions.

  • Hallucinations occur when models generate confident, fluent, but factually incorrect information. The model optimizes for plausible next-token predictions, not truth. Mitigation strategies include grounding responses with RAG, using low temperature settings, and requiring human-in-the-loop review for high-stakes outputs.

  • Prompt injection is an attack vector where adversarial users craft inputs that manipulate the model into ignoring system instructions or producing harmful output. Input validation layers and Amazon Bedrock Guardrails are essential defenses.

  • Bias in the training data means models can reproduce and amplify unfair or harmful patterns present in the corpora they were trained on, leading to outputs that disadvantage certain groups.

  • Data privacy concerns arise when sensitive information included in prompts is processed by the model, requiring strict governance policies around what data flows through inference requests.

These risks make human oversight non-negotiable for production deployments. Responsible AI practices, covered later in this course, build directly on the risk awareness established here. Understanding these limitations is as important as understanding the capabilities when making architectural decisions. Every design review should explicitly address how each risk category is mitigated.

Conclusion

Generative AI creates new content by learning data distributions, standing in clear contrast to discriminative models that classify or predict. Transformers, diffusion models, and GANs each serve different modalities, with Transformers dominating text and code workloads through self-supervised pre-training at scale. Inference parameters like temperature, top_p, and max tokens give practitioners direct control over output quality and cost. Business problems map to a structured taxonomy of use cases, and the right customization approach follows a clear priority: prompt engineering first, then RAG, then fine-tuning. Hallucinations, prompt injection, bias, and privacy risks demand guardrails and human oversight in every production system.

Now that you understand the core concepts, the next lesson explores the main generative AI services on AWS and introduces Amazon Bedrock, a managed service for accessing foundation models and building generative AI applications.