ETL Example—Load
Learn how to load data into a PostgreSQL database from the shell.
We'll cover the following...
We'll cover the following...
The last step in the ETL pipeline is to load the cleaned and processed data stored in clean_data.csv
into a PostgreSQL database.
Connecting to the database
First, we create a Bash script called load_data.sh
that will switch users from “root” to “postgres,” start a Postgres database, and run an SQL script to load the data. The SQL script will be stored in a file called load_data.sql
.
#!/bin/bash# Setting the project location as an environment variableexport WORKDIR='/usercode/ETL_Lottery'# Setting Postgres bin and data directories as environment variablesexport POSTGRES_BIN='/usr/lib/postgresql/16/bin'export POSTGRES_DATA='/var/lib/postgresql/data'# Create and start a new PostgreSQL database cluster.# Connect to Postgres and run the file: load_data.sqlsu - postgres -c "${POSTGRES_BIN}/initdb -D ${POSTGRES_DATA}${POSTGRES_BIN}/pg_ctl -D ${POSTGRES_DATA} startpsql postgres -f ${WORKDIR}/load_data.sql"
Deploying a PostgreSQL database: load_data.sh
Lines 4–8: We set the project location and Postgres bin directory as environment variables.
...