ETL Pipeline Exercise: Transform
Learn about the transform social media pipeline using Apache Airflow.
We'll cover the following...
We'll cover the following...
To continue our pipeline implementation, we’ll now focus on transforming the extracted data. According to the business requirements and the schema of the data warehouse, there are a few issues we need to fix with our extracted data. They are:
To change the month format of all date columns from numerical to text (for example, from
08toAug)To remove tabs and new lines from columns
comment_textandpost_textTo bin the number of followers into three categories,
low,medium,high(the number of followers lower than 1000 will be inlowcategory, the number of followers ...