Missing Data

Introduction to missing data problems and ways to handle them.

We'll cover the following

Real-world data that is available for an experiment is not always well structured or properly maintained. Often, some data is missing, or some unexpected value appears. There could be multiple sources that cause missing data:

  • Generated by human errors
  • Data dump in copying (wrongly applied joins)
  • Machine errors; all sensors are not generating data, few may be down.
  • No decision available at some point for a feature value.

We need to handle missing data properly. Most algorithms do not understand missing data and they treat missing values as a special value. For some algorithms that is fine but for others, it can cause results to deviate from the true result. So, as a good practice, we should always handle missing data as a part of data preprocessing.

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.