Exercise: Exploring the Credit Limit and Demographic Feature

Learn to identify and correct data quality issues and visualize continuous data using histograms.

We'll cover the following...

Data quality assurance and exploration
Visualizing the features using histograms
Try it yourself

Data quality assurance and exploration

So far, we remedied two data quality issues just by asking basic questions or by looking at the info() summary. Let’s now take a look at the first few columns of data. Before we get to the historical bill payments, we have the credit limits of the LIMIT_BAL accounts, and the SEX, EDUCATION, MARRIAGE, and AGE demographic features. Our business partner has reached out to us, to let us know that gender should not be used to predict credit-worthiness, as this is unethical by their standards. So we keep this in mind for future reference. Now we’ll explore the rest of these columns, making any corrections that are necessary.

In order to further explore the data, we will use histograms. Histograms are a good way to visualize data that is on a continuous scale, such as currency amounts and ages. A histogram groups similar values into bins and shows the number of data points in these bins as a bar graph.

To plot histograms, we will start to get familiar with the graphical capabilities of pandas. pandas relies on another library called Matplotlib to create graphics, so we’ll also set some options using matplotlib. Using these tools, we’ll also learn how to get quick statistical summaries of data in pandas.

Visualizing the features using histograms

In this exercise, we’ll start our exploration of data with the credit limit and age features. We will visualize them and get ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Exploring the Credit Limit and Demographic Feature

Data quality assurance and exploration

Visualizing the features using histograms