...

/

Exercise: Exploring and Cleaning the Data

Exercise: Exploring and Cleaning the Data

Learn to explore and clean the data for predictive modeling.

Thus far, we have identified a data quality issue related to the metadata: we had been told that every sample from our dataset corresponded to a unique account ID, but found that this was not the case. We were able to use logical indexing and pandas to correct this issue. This was a fundamental data quality issue, having to do simply with what samples were present, based on the metadata. Aside from this, we are not really interested in the metadata column of account IDs: these will not help us develop a predictive model for credit default.

Examining features and

...