Zero-ETL Integration with Amazon DynamoDB and SageMaker Lakehouse

Zero-ETL Integration with Amazon DynamoDB and SageMaker Lakehouse
Zero-ETL Integration with Amazon DynamoDB and SageMaker Lakehouse

CLOUD LABS



Zero-ETL Integration with Amazon DynamoDB and SageMaker Lakehouse

In this Cloud Lab, you’ll learn how zero-ETL replication from Amazon DynamoDB to the Amazon SageMaker Lakehouse enables seamless, periodic data transfer and eliminates the need to build or maintain traditional ETL pipelines.

8 Tasks

intermediate

2hr

Certificate of Completion

Desktop OnlyDevice is not compatible.
No Setup Required
Amazon Web Services

Learning Objectives

Working knowledge of zero-ETL integration between Amazon DynamoDB and SageMaker Lakehouse
Working knowledge of analyzing data in SageMaker Lakehouse with Amazon Athena

Technologies
Glue
Athena
DynamoDB logoDynamoDB
Cloud Lab Overview

Amazon SageMaker Lakehouse zero-ETL integration simplifies machine learning workflows by replicating data from various data stores in data lakes like Amazon S3 and making it readily available. This integration eliminates the need for complex ETL processes, allowing data scientists to directly query and use data from multiple data sources, such as DynamoDB, Salesforce, Instagram ads, etc., for training and inference. By leveraging this seamless integration, organizations can accelerate ML model development, reduce operational overhead, and ensure real-time access to the latest data.

The following is the high-level architecture diagram of the infrastructure that you’ll create in this Cloud Lab:

Performing Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse
Performing Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse

In this Cloud Lab, you’ll create a source database in Amazon DynamoDB and a target database in AWS Glue Data Catalog. The data will be stored in Amazon S3, which SageMaker Lakehouse uses as the underlying storage for data lakes. You will then create an IAM role and configure resource-based policies for the DynamoDB table and Glue Data Catalog to provide permissions for the zero-ETL integration of DynamoDB and SageMaker Lakehouse. After that, you’ll configure the zero-ETL integrations. In the end, you’ll query the replicated data with Amazon Athena through the Glue Data Catalog.

Cloud Lab Tasks
1.Introduction
Getting Started
2.Create the Necessary Resources
Create an IAM Role
Create the DynamoDB Table
Create S3 Bucket and Glue Database
3.Set Up SageMaker Lakehouse Zero-ETL Integration
Create Zero-ETL Integration with DynamoDB Table
Query the SageMaker Lakehouse Data
4.Conclusion
Clean Up
Wrap Up
Labs Rules Apply
Stay within resource usage requirements.
Do not engage in cryptocurrency mining.
Do not engage in or encourage activity that is illegal.
Hear what others have to say
Join 1.4 million developers working at companies like