Search⌘ K

ETL Example—Scheduling

Understand how to automate ETL pipeline execution using the cron scheduler on Unix systems. This lesson guides you through creating cronjobs to run ETL scripts every Monday at 9 AM, ensuring timely data refreshes without manual intervention.

We'll cover the following...

The company wants us to schedule the ETL pipeline we built so it would run once a week on Monday at 09:00 a.m. That way, the data scientist will have updated data as new data about lottery numbers comes in without bothering us to deploy the ETL processes repeatedly. To schedule the ETL pipeline, we’ll use cron.

Cron

Cron is a command-line utility for scheduling jobs on Unix operating systems. Assuming we’re operating on one, we can easily schedule commands or shell scripts to run automatically and on a schedule. Tasks scheduled using cron are also known as cronjobs.

Cron is a very useful tool for repetitive tasks like the one we just built. To create cronjobs, we must insert the right syntax in the crontab, a file that stores all the scheduled jobs. ...