Search⌘ K

LR Implementation Steps: 4 to 7

Explore practical steps to handle missing data in datasets by removing or filling values, set your dependent and independent variables, and apply the linear regression algorithm. This lesson helps you understand managing incomplete data and interpreting model coefficients to build a reliable prediction model using Python.

4) Remove or modify variables with missing values

Our exploratory data analysis shows that missing values pose a problem for this dataset, especially since a linear regression does not run smoothly with missing values. Therefore, we need to estimate or remove these values from the data frame.

However, the data frame size will be greatly reduced if you choose to remove all missing values on a row-by-row basis. The variable BuildingArea, for instance, has 21,115 missing rows, which makes up two-thirds of the data frame! To preserve row depth, you can remove this variable entirely, especially ...