Search⌘ K
AI Features

Data Ingestion and Transformation II

Explore methods to implement event-driven triggers for AWS Glue ETL jobs, optimize batch ingestion of small files using Spark and AWS Glue options, and choose suitable orchestration services like Step Functions and Amazon MWAA. Understand best practices for integrating external APIs and deploying data pipelines using AWS SAM to build efficient, production-ready data workflows.

Question 6

A data engineer is designing a pipeline where an AWS Glue ETL job must run every time a new partition of data lands in an S3 bucket. The data arrives at unpredictable intervals. The engineer wants an event-driven approach rather than polling.

Which solution provides an event-driven trigger for the Glue job?

A. Schedule the AWS Glue job on a fixed cron schedule that runs every 15 minutes to check for new data.

B. Configure S3 Event Notifications to directly invoke the AWS Glue ETL job on s3:ObjectCreated:* events.

C. Use AWS Step Functions with a polling loop that checks S3 for new objects every 5 minutes and starts the Glue job when new data is detected.

D. Enable S3 Event Notifications to Amazon EventBridge, then create an EventBridge rule that matches the S3 PutObject event and triggers the Glue job via a Lambda function or ...