Orchestrating AWS Glue Workflows

Orchestrating AWS Glue Workflows
Orchestrating AWS Glue Workflows

CLOUD LABS



Orchestrating AWS Glue Workflows

In this Cloud Lab, you’ll build an end-to-end automated data pipeline using AWS Glue Workflows, set up an EventBridge trigger, run a Glue ETL job, and ingest data from S3.

7 Tasks

beginner

1hr 30m

Certificate of Completion

Desktop OnlyDevice is not compatible.
No Setup Required
Amazon Web Services

Learning Objectives

Hands-on experience creating and managing AWS Glue workflows
Working knowledge of S3 event notifications and EventBridge rules
Ability to design ETL jobs for transforming datasets
Understanding of workflow triggers and conditional ETL job execution

Technologies
Glue
EventBridge logoEventBridge
S3 logoS3
Cloud Lab Overview

AWS Glue is a managed ETL (extract, transform, and load) and data integration service for building and running data pipelines. It lets you prepare and transform data for analytics without managing the underlying infrastructure. Glue integrates with common AWS services such as S3, Athena, and Redshift and can scale to large workloads. This makes it a practical option for running automated ETL at scale. If you’re learning ETL orchestration and data workflows on AWS, Glue is a good service to know.

In this Cloud Lab, you’ll create an S3 bucket with input and output folders and enable EventBridge notifications. You’ll create an ETL job that transforms CSV files from an S3 bucket by converting dates, standardizing product names, calculating total amounts, and filtering out cancelled or returned orders. Next, you’ll create an EventBridge rule to trigger your Glue workflow whenever a new file is uploaded and add the ETL job to the workflow. Finally, you’ll monitor the workflow execution and validate the transformed output in the S3 output folder.

By the end of this Cloud Lab, you’ll gain hands-on experience orchestrating AWS Glue workflows, automating ETL jobs, and configuring S3 and EventBridge integrations. You’ll be able to implement event-driven data processing pipelines, monitor job runs, and manage workflow executions efficiently. This knowledge will help you build scalable, automated data pipelines in real-world scenarios.

The following is the high-level architecture diagram of the infrastructure you’ll create in this Cloud Lab:

Event-driven AWS Glue ETL Workflow
Event-driven AWS Glue ETL Workflow

Cloud Lab Tasks
1.Introduction
Getting Started
2.Set Up a Data Store and Notification Trigger
Create an S3 Bucket and Configure EventBridge Notifications
3.Set Up an AWS Glue Workflow
Create a Glue ETL Job Using Visual ETL
Create a Glue Workflow
4.Orchestrating the Workflow
Create an EventBridge Rule and Trigger the AWS Glue Workflow
5.Conclusion
Clean Up
Wrap Up
Labs Rules Apply
Stay within resource usage requirements.
Do not engage in cryptocurrency mining.
Do not engage in or encourage activity that is illegal.
Hear what others have to say
Join 1.4 million developers working at companies like