Data Scrubbing Operation: Drop Missing Values
Explore how to handle missing data in machine learning by understanding categories like MCAR, MAR, and nonignorable missingness. Learn practical Python techniques such as dropna and fillna to inspect, fill, or remove missing values based on dataset characteristics. This empowers you to clean data effectively for better model accuracy.
We'll cover the following...
Quick overview: Another common but more complicated problem is deciding what to do with missing data. Missing data can be split into three categories:
- Missing completely at random (MCAR)
- Missing at random (MAR)
- Nonignorable.
In other words, the reason why the value is missing is linked to another variable in the dataset and not due directly to the value itself.
Lastly, nonignorable missing data constitutes the absence of data due directly to its own value or significance of the information. For example, tax-evading citizens or respondents with a criminal record may decline to supply ...