Search⌘ K
AI Features

Zero-ETL Integration with Amazon DynamoDB and SageMaker Lakehouse

Takes 120 mins

Amazon SageMaker Lakehouse zero-ETL integration simplifies machine learning workflows by replicating data from various data stores in data lakes like Amazon S3 and making it readily available. This integration eliminates the need for complex ETL processes, allowing data scientists to directly query and use data from multiple data sources, such as DynamoDB, Salesforce, Instagram ads, etc., for training and inference. By leveraging this seamless integration, organizations can accelerate ML model development, reduce operational overhead, and ensure real-time access to the latest data.

The following is the high-level architecture diagram of the infrastructure that you’ll create in this Cloud Lab:

Performing Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse
Performing Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse

In this Cloud Lab, you’ll create a source database in Amazon DynamoDB and a target database in AWS Glue Data Catalog. The data will be stored in Amazon S3, which SageMaker Lakehouse uses as the underlying storage for data lakes. You will then create an IAM role and configure resource-based policies for the DynamoDB table and Glue Data Catalog to provide permissions for the zero-ETL integration of DynamoDB and SageMaker Lakehouse. After that, you’ll configure the zero-ETL integrations. In the end, you’ll query the replicated data with Amazon Athena through the Glue Data Catalog.