Search⌘ K
AI Features

Data Ingestion and Transformation IV

Explore how to configure AWS Glue to securely connect to private Amazon RDS instances, optimize ETL jobs to handle data skew and schema inference delays, and orchestrate multi-step serverless data pipelines using AWS Step Functions. Understand best practices for efficient batch loading into Redshift and continuous replication with DMS.

Question 16

A data engineer needs to connect an AWS Glue ETL job to an Amazon RDS for Oracle database to extract data. The RDS instance is in a private subnet with no internet access. The Glue job runs in the AWS Glue managed VPC. The engineer must establish connectivity between Glue and the RDS instance.

Which configuration should the data engineer implement?

A. Create a public endpoint for the RDS instance so that the Glue job can connect over the internet.

B. Configure the AWS Glue connection with the VPC, subnet, and security group that allow access to the RDS instance’s private subnet, ensuring the RDS security group allows inbound traffic from the Glue connection’s security group on the JDBC port.

C. Use AWS DMS instead of Glue to extract data from the RDS instance, since DMS natively supports VPC connectivity.

D. Add a NAT gateway to the private subnet so that the Glue job can reach the RDS instance through the internet.

Question 17

A data engineer is troubleshooting an AWS Glue ETL job that fails intermittently with an OutOfMemoryError. The job processes a large dataset with significant data skew, one partition has 10× more records than others. The data engineer ...