Credit Scoring Problem
Explore the credit scoring problem to understand how AI models impact lending decisions, fairness, and potential biases. Learn key evaluation metrics and the importance of addressing discrimination in credit risk models through a practical example using the German Credit dataset.
We'll cover the following...
The credit scoring problem
Before we dive deep into AI fairness, it is time to introduce a motivating example.
Credits and loans are essential aspects of modern society. Credit decisions can influence people’s lives, such as when buying a house. This task is also vital for financial institutions. Loans generate a lot of income, but unpaid loans are a considerable cost. The impact can be even worse: we saw this during the 2007 financial crisis related to unpaid mortgages.
In an ideal world, a bank can predict if a borrower will pay all liabilities perfectly. In such a situation, there are no unpaid installments, so there is no cost of debt collection. On the other hand, people who won’t be able to pay liabilities wouldn’t get the loan. That’s why banks invest a lot of money and effort to create a set of rules by which loan applications will be accepted. These models may or may not use machine learning. As
As machine learning practitioners, we know there is no perfect model, and we must deal with wrong predictions. We can easily formulate credit decision problems as a binary classification. We can denote credit denial as the negative class (
True positive (TP)
The loan has been approved and paid back.
True negative (TN)
The loan has been rejected and would be defaulted.
False negative (FN)
The loan has been approved and defaulted.
False positive (FP)
The loan has been rejected but would be paid back.
Model evaluation
To evaluate the model, we can compute metrics like accuracy, precision, recall, F1-score, and more. However, such evaluation is focused on lenders—they would like to maximize profit. When analyzing fairness, we will also be interested in the borrower’s situation. This does not mean that regular metrics are no longer necessary. Instead, we introduce an additional dimension of analysis.
Assume that a person was told they wouldn’t get the loan in bank A. They go to bank B without success as well. The story continues in bank C. A’s possible explanation is that the person will not be able to pay it back. But there is another one: the person might be systematically discriminated against by the creditworthiness model. This would be especially likely if all banks used a model from the same vendor. Such a situation might seem unlikely initially, but some people can be excluded from society when it happens.
We can work on a similar confusion matrix from the customer’s perspective.
True positive (TP)
I didn’t get the loan, but I would have paid it back if I had gotten it.
True negative (TN)
I got the loan and I’m not able to pay it back.
False negative (FN)
I got the loan and paid it back.
False positive (FP)
I didn’t get the loan, but I wouldn’t be able to pay it back anyway.
German Credit dataset
In this section, we are going to build a simple credit risk model using the German Credit dataset. It is pretty old (1994) and contains only 1000 samples, but it will be sufficient for our purposes. Also, it is a common data source for AI Fairness research. The original data is in a kind of obscure format, so we will use a slightly preprocessed version from the
The dataset contains one target variable indicating credit risk, described as either good or bad. There are 20 more features that will help us to build the model (but we will not use all of them). We are going to explore them in more detail in the next section. For now, we need to know that they describe loan parameters (duration, amount, purpose) and customer (personal status, job, employment). We are going to perform a simple exploratory data analysis in the next lesson.