Data Operations and Support I
Explore how to manage complex AWS data pipelines involving Kinesis, Glue, and Redshift with proper orchestration and conditional logic. Understand techniques to improve Athena query performance for partitioned data. Learn troubleshooting methods for Glue ETL jobs under memory pressure. Discover audit logging solutions with CloudTrail and best practices for creating materialized views in Redshift that refresh automatically.
We'll cover the following...
Question 40
A media company has a complex data pipeline that ingests data from Amazon Kinesis Data Streams, transforms it with AWS Glue, and loads it into Amazon Redshift. The pipeline has multiple dependent stages that must execute in a specific order, with conditional branching based on the success or failure of each stage. The company needs a fully managed orchestration solution that provides visual workflow tracking and supports error handling with retry logic.
Which orchestration service should the data engineer select to meet these requirements with the least operational overhead?
A. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to define a DAG that orchestrates each pipeline stage, configuring task retries and branching operators for conditional logic.
B. Use AWS Step Functions to define a state machine with Choice states for conditional branching, Retry and Catch fields for error handling, and native service integrations for Kinesis, Glue, and Redshift.
C. Use AWS Glue workflows with triggers to orchestrate the Glue crawlers and ETL jobs, configuring conditional triggers based on job completion status for branching logic.
D. Use Amazon EventBridge rules to chain the pipeline stages together, ...