Summary and Quiz on Data Transformation and Feature Engineering
Explore how to choose the appropriate AWS service for your data transformation and feature engineering needs in machine learning workflows. Understand the strengths and trade-offs of AWS Glue, DataBrew, EMR with Spark, and SageMaker Data Wrangler. This lesson helps you grasp best-fit scenarios to build efficient, scalable ML data pipelines and apply feature engineering effectively using AWS tools.
We'll cover the following...
Summary
The AWS Certified Machine Learning Engineer – Associate (MLA-C01) exam frequently presents scenarios in which multiple AWS services could technically perform a task, but only one is the best fit. This section summarizes a decision framework for AWS data-processing services that helps you separate correct answers from plausible distractors.
AWS Glue
AWS Glue is a serverless ETL service designed for programmatic, code-driven data pipelines. When new JSON files land in an S3 bucket, an AWS Glue crawler scans the data, infers the schema, and registers it in the AWS Glue Data Catalog. A Glue ETL job written ...