A Practical Guide to Machine Learning with Python/

...

Data Scrubbing Operation: Drop Missing Values

We will cover ways of removing missing data values.

We'll cover the following...

Quick overview: Another common but more complicated problem is deciding what to do with missing data. Missing data can be split into three categories:

Missing completely at random (MCAR)
Missing at random (MAR)
Nonignorable.

MCARMissing Completely at Random occurs when there’s no relationship between a missing value and other values in the dataset. Oftentimes, the value is not readily available and is therefore left out of the dataset.

MARMissing at Random means the missing value is not related to its own value but is instead related to the values of other variables. In census surveys, for example, a respondent might skip an extended response question because relevant information was inputted in a previous question, or they fail to complete the census survey due to low levels of language proficiency as stated by the respondent elsewhere in the survey.

In other words, the reason why the value is missing is linked to another variable in the dataset and not due directly to the value itself.

Lastly, nonignorable missing data constitutes the absence of data due directly to its own value or significance of the information. For example, tax-evading citizens or respondents with a criminal record may decline to supply ...

Introduction to Course

Introduction to Machine Learning

Exploratory Data Analysis

Data Scrubbing

Pre-Model Algorithms

Customer Segmentation with K-Means Clustering

Split Validation

Model Design

Linear Regression

Logistic Regression

Support Vector Machines

K-Nearest Neighbors

Tree-Based Methods

Cardiovascular Disease Risk Prediction with Random Forest

Conclusion

Appendix

Data Scrubbing Operation: Drop Missing Values