Search⌘ K

ETL Transformation Example: Addressing Data Quality Issue

Explore how to transform extracted data to meet business needs by converting raw data into tabular format, eliminating duplicate rows, and resolving data quality issues such as missing values. Understand the practical steps to prepare data for loading into a PostgreSQL database, enhancing the reliability of your ETL pipelines.

Transform

Now that we have extracted the raw data, let’s transform it according to the needs and context of the business. At this point, we need to talk to the user who is requesting the data. For this demonstration, the user is the company’s data scientist.

The data scientist requests that the data be in tabular form without missing or null values. Also, there shouldn't be any duplicate dates in the file, and the data needs to have eight columns separated by commas. The columns are:

  1. “Date”

  2. “First_Lottery_number”

  3. “Second_Lottery_number”

  4. “Third_Lottery_number”

  5. “Fourth_Lottery_number”

  6. “Fifth_Lottery_number” ...