AWS Storage Services
Understand how to select and utilize AWS storage services like Amazon S3, Amazon EFS, and DynamoDB to build resilient and scalable data infrastructures for generative AI systems. Learn how these services manage diverse data types, enable distributed training, and maintain conversation state in production-grade AI applications.
In generative AI (GenAI) development, our models are only as good as the data that feeds them and the context they can remember. Therefore, it’s important to choose the right storage layer that matches the data’s access pattern to the speed and cost requirements of our application. As we build GenAI systems, we have to manage large training datasets, shared configuration files across distributed compute clusters, and efficient state management for user conversations.
In this lesson, we will examine how we use the AWS storage services to create a resilient, context-aware AI infrastructure.
Building a data foundation with Amazon S3
Amazon Simple Storage Service (S3) is an object storage service designed to store and retrieve any amount of data from anywhere. A single bucket can store any type of data, including files, images, videos, and more.
Each bucket is assigned a globally unique name and tied to a specific AWS Region. Therefore, it provides a secure, structured namespace for managing everything from raw datasets to finalized model artifacts.
Amazon S3 offers several advanced features:
Storage classes: We use S3 Standard for frequently accessed training data, while S3 Intelligent-Tiering can automatically move aging datasets to lower-cost tiers like Glacier without impacting retrieval speed when we need to re-run an experiment.
S3 Express One Zone: This is a high-performance storage class we use for our most latency-sensitive ML/AI training jobs. It provides single-digit millisecond data access, which is critical when we are saturating thousands of GPUs during a large-scale fine-tuning run.
Object versioning: We enable this to track changes to our model checkpoints. If a new fine-tuning run results in a degraded model, versioning allows us to quickly roll back to a good set of weights.
In addition to these, S3 also offers S3 vector buckets for GenAI applications. It enables a bucket to act as a native vector store, allowing us to store and query vector embeddings directly in S3. This reduces the ...