Building a Model
Explore the process of building a credit scoring model while focusing on fairness. Learn to perform exploratory data analysis to identify sensitive attributes and features. Understand how to create a logistic regression baseline model and evaluate its fairness to address bias and improve equity.
We'll cover the following...
Exploratory data analysis
Before we build the model, we need to perform exploratory data analysis. We are going to load it and go feature-by-feature in order to understand its meaning and characteristics. During this stage, we will try to identify sensitive attributes and important features that can be used for modeling. The most important questions we need to answer are:
How many observations do we have?
Is the dataset balanced?
What are the number and types of features?
In addition, we perform simple data processing for further modeling.
Let’s attempt a quiz about the results of exploratory analysis.
Exploratory Data Analysis Quiz
(Select all that apply.) Which attribute can be considered sensitive? Multi-select
personal_status
job
housing
savings_status
checking_status
credit_amount
duration
purpose
age
Building the model
Now, we are ready to build a simple model and evaluate it. We will be working with it through the end of this lesson. We are going to use logistic regression for two reasons:
It is a very simple model for a quick baseline.
It is not a complete black box.
We built a simple model for credit scoring. It could be more accurate, but we are more interested in another question: is it fair?