Search⌘ K

ETL Transformation Example: Handling Missing Values and Data

Explore practical ETL data transformation techniques focused on handling missing and null values. Learn how to delete rows with nulls, replace missing numeric values, and clean empty rows using shell scripts to improve data quality in ETL pipelines.

Task 3: Delete rows with null values

As we might’ve noticed, some rows had null values. According to the data scientist, all rows with null values must be dropped.

Shell
# *** *** #
# *** Issue #3 - Get rid of all rows with null values *** #
# *** *** #
echo -e "\nTask #3 - Searching for Null Values"
extract_nulls(){
# Create an empty file called null_values.txt
> null_values.txt
# extract all lines with "null" and append to the file
cat raw_data.csv | grep null >> null_values.txt
}
extract_nulls
delete_nulls(){
if [ -s null_values.txt ]
then
echo -e "\nFound Null Values:"
cat null_values.txt
# Grab the dates of lines
null_dates=$(cut --delimiter "|" --fields 1 null_values.txt)
echo -e "\nRemoving Null Values..."
# Remove all rows with null dates from raw_data.csv
grep -v "$null_dates" raw_data.csv > temp.csv; mv temp.csv raw_data.csv
echo -e "Done."
else
echo "There Are No Null Values"
fi
}
delete_nulls

We’ve created a file called transform_data_3_null.sh to perform these operations. Let’s walk through the code line by line. 

  • The script contains two functions, extract_nulls() and delete_nulls().

  • Line 8: We create an empty text file called null_values.txt.

  • Line 11: We append rows with null values from ...