Filling In Data
Explore how to identify missing CPI and Unemployment values in retail sales data and fill them using previous values through imputation. Understand simple and advanced imputation methods to handle incomplete datasets effectively in machine learning projects.
We'll cover the following...
Chapter Goals:
- Find the rows that contain missing values for
'CPI'and'Unemployment' - Fill in the missing values using previous row values
A. Finding the missing values
We previously noted that both the 'CPI' and 'Unemployment' features contain 585 missing values. We’ll find the row indexes containing these missing values by first converting the feature columns in the na_values boolean DataFrame to integers, i.e. 0 and 1.
We then use the nonzero function to find the locations of the 1’s, which correspond to the True values.
The row indexes are located in the na_indexes_cpi and na_indexes_une NumPy arrays, which you can see contain the exact same row indexes (sorted in ascending order). Now let’s take a closer look at the exact rows that contain ...