ETL Pipeline Example: Airflow Extraction Task

Learn to add the extract function to an Airflow DAG.

After completing the previous task, we now have a function called extract for extracting the latest batch of data from the production database in a file called helper.py. We’ll add this function as a task in an airflow DAG to complete the extract step.

This is how we build our pipelines using airflow, by adding more and more tasks to a particular DAG.

Airflow operators

Before showing how to do that, let’s discuss airflow operators. In Airflow, operators are the building blocks that define tasks in a pipeline. Each operator is different and can be used to perform a particular type of task, such as executing a SQL query, transferring files, running a Python script, etc. Some common operators are:

  • BashOperator: Used to execute Bash commands or a script

  • PythonOperator: Used to execute Python commands or a script

  • HttpSensor: Pokes an HTTP service until a response is received

  • PostgresOperator: Used to interact with a PostgreSQL database

Each task in an Airflow DAG is executed by some type of operator. An ETL pipeline can be built by using a combination of different operators working in sequence or parallel. For example, an ETL pipeline built using Airflow operators can look like this:

Get hands-on with 1200+ tech skills courses.