What Is Generative AI?
Explore the fundamentals of generative AI, distinguishing it from traditional machine learning. Understand core model architectures like Transformers, diffusion models, and GANs, and learn how large language models gain advanced capabilities. This lesson guides you in mapping business problems to AI solutions, configuring inference parameters, and recognizing risks such as hallucinations and bias to ensure responsible AI deployment in production environments.
Generative AI changes how many software systems produce output. Unlike traditional ML systems that classify inputs or predict labels, generative models generate new content, such as text, images, audio, video, and code, based on patterns learned from large training datasets. For developers and solutions architects on AWS, this changes design decisions across application layers, from customer-facing interfaces to internal automation pipelines.
Traditional machine learning, often called discriminative AI, draws decision boundaries. A discriminative model trained on support tickets learns to assign a label such as “billing” or “technical” to each incoming request. Given the same support ticket, a generative model drafts a customer-facing response. The discriminative model predicts a category; the generative model produces content. This distinction shapes many architectural choices you’ll make in this course.
Before going further, a handful of terms will appear throughout every remaining lesson. A
Note: Amazon Bedrock provides managed access to foundation models from AWS and third-party providers. The next lesson covers Bedrock in more detail. You do not need to provision, host, or operate the model infrastructure yourself.
The diagram below illustrates how generative AI works:
This lesson covers five objectives: distinguishing generative AI from discriminative AI, identifying core generative architectures, explaining how large language models gain capabilities through scale, mapping business problems to use case categories, and articulating the risks that shape governance decisions.
Core generative architectures
Three model families power the majority of today’s generative AI systems. Each operates on a different principle and targets different content modalities, so understanding their mechanics, even at a high level, is essential for selecting the right model for a given workload.
Transformers
The
Diffusion models and GANs
Diffusion models work through a two-phase process. During training, the forward phase gradually adds random noise to an image until it becomes pure static. The model then learns the reverse process, removing noise step by step to reconstruct, or generate, a coherent image from randomness. When guided by a text prompt, diffusion models produce images that match the description. This architecture powers image- and video-generation services.
Generative adversarial networks (GANs) use a competitive setup. The generator produces synthetic samples that resemble the training data, while the discriminator aims to distinguish real from generated samples. Training improves both networks: the generator learns to produce more realistic samples, and the discriminator learns to detect generated samples. While GANs were historically dominant for image synthesis, Transformers and diffusion models have largely overtaken them for new production workloads due to greater stability and output quality. Architecture choice depends on the target modality and use case.
The following table summarizes the trade-offs:
Comparison of Generative AI Architectures
Architecture | Primary Modality | How It Generates | Typical Use Cases |
Transformers | Text and code | Next-token prediction using self-attention over sequences | LLM chat, code generation, summarization, translation |
Diffusion Models | Images and video | Iterative denoising from random noise guided by a text prompt | Image generation, image editing, video synthesis |
GANs | Images | Generator-discriminator adversarial training loop | Style transfer, data augmentation, super-resolution (largely superseded by diffusion models) |
Now that you’ve seen the main model families, the next section looks more closely at the model families most commonly used in production AI applications.
Large language models and scale
LLMs are trained using
Context window and why it matters
The
Inference parameters
Practitioners control output behavior through several key parameters:
Temperature governs the randomness of the model’s output, where lower values produce more deterministic responses and higher values increase creativity.
Top_p (nucleus sampling) sets a cumulative probability threshold so the model only considers the most likely tokens, filtering out low-probability noise.
Max tokens caps the length of the generated completion, directly affecting both response quality and inference cost.
These parameters are configurable through Amazon Bedrock’s API. Choosing the right combination is a practical skill. A customer-facing FAQ bot benefits from low temperature and constrained max tokens, while a creative writing assistant may use higher temperature and a generous token budget.
Practical tip: Start with a temperature of 0.2–0.3 for factual tasks and increase only when the use case explicitly requires creative variation. This single adjustment often has the largest impact on output quality.
The following markmap organizes the business problems that LLMs and other generative models address:
Business value and multimodal AI
Mapping a business problem to the correct use case category is the first step in any generative AI architecture. A summarization requirement points toward an LLM. Generating product images from text descriptions requires a diffusion model. Getting this mapping wrong leads to wasted effort and suboptimal results.
Multimodal generative AI extends this further with models that accept and produce multiple modalities, including text, images, and video, within a single model. Consider a practical scenario: a multimodal model analyzes a product photograph and generates a marketing description, or accepts a chart image and answers questions about the data it contains. This eliminates the need to chain separate single-modality models together, simplifying the overall architecture and reducing latency.
Amazon Bedrock provides access to multimodal foundation models. When deciding how to customize model behavior, the order of preference matters significantly. Prompt engineering should be the first approach tried because it requires no additional infrastructure and delivers results in minutes. If the model needs access to domain-specific knowledge, RAG via Bedrock Knowledge Bases augments prompts with retrieved context without modifying the model itself. Fine-tuning should be reserved for cases where neither prompting nor RAG achieves the required performance, because it adds complexity, cost, and ongoing maintenance burden.
Attention: A common and expensive mistake is jumping directly to fine-tuning when prompt engineering or RAG would have solved the problem. Always validate simpler approaches first.
The following quiz tests your understanding of these architectural decisions:
Lesson Quiz
A solutions architect needs a model that can accept a product image and generate a text description. Which capability is required?
Discriminative classification
Multimodal generative AI
Generative adversarial network
Hyperparameter optimization
Understanding what generative AI can do is only half the picture. The risks it introduces are equally important for production systems.
Limitations and risks
Every generative AI deployment must account for four categories of risk that directly affect architectural and governance decisions.
Hallucinations occur when models generate confident, fluent, but factually incorrect information. The model optimizes for plausible next-token predictions, not truth. Mitigation strategies include grounding responses with RAG, using low temperature settings, and requiring human-in-the-loop review for high-stakes outputs.
Prompt injection is an attack vector where adversarial users craft inputs that manipulate the model into ignoring system instructions or producing harmful output. Input validation layers and Amazon Bedrock Guardrails are essential defenses.
Bias in the training data means models can reproduce and amplify unfair or harmful patterns present in the corpora they were trained on, leading to outputs that disadvantage certain groups.
Data privacy concerns arise when sensitive information included in prompts is processed by the model, requiring strict governance policies around what data flows through inference requests.
These risks make human oversight non-negotiable for production deployments. Responsible AI practices, covered later in this course, build directly on the risk awareness established here. Understanding these limitations is as important as understanding the capabilities when making architectural decisions. Every design review should explicitly address how each risk category is mitigated.
Conclusion
Generative AI creates new content by learning data distributions, standing in clear contrast to discriminative models that classify or predict. Transformers, diffusion models, and GANs each serve different modalities, with Transformers dominating text and code workloads through self-supervised pre-training at scale. Inference parameters like temperature, top_p, and max tokens give practitioners direct control over output quality and cost. Business problems map to a structured taxonomy of use cases, and the right customization approach follows a clear priority: prompt engineering first, then RAG, then fine-tuning. Hallucinations, prompt injection, bias, and privacy risks demand guardrails and human oversight in every production system.
Now that you understand the core concepts, the next lesson explores the main generative AI services on AWS and introduces Amazon Bedrock, a managed service for accessing foundation models and building generative AI applications.