Search⌘ K
AI Features

Credit Scoring Problem

Explore the credit scoring problem to understand how AI models impact lending decisions, fairness, and potential biases. Learn key evaluation metrics and the importance of addressing discrimination in credit risk models through a practical example using the German Credit dataset.

The credit scoring problem

Before we dive deep into AI fairness, it is time to introduce a motivating example.

Credits and loans are essential aspects of modern society. Credit decisions can influence people’s lives, such as when buying a house. This task is also vital for financial institutions. Loans generate a lot of income, but unpaid loans are a considerable cost. The impact can be even worse: we saw this during the 2007 financial crisis related to unpaid mortgages.

In an ideal world, a bank can predict if a borrower will pay all liabilities perfectly. In such a situation, there are no unpaid installments, so there is no cost of debt collection. On the other hand, people who won’t be able to pay liabilities wouldn’t get the loan. That’s why banks invest a lot of money and effort to create a set of rules by which loan applications will be accepted. These models may or may not use machine learning. As George BoxBox GEP Draper NR. Empirical model-building and response surfaces. New York, NY: Wiley, 1987: Vol. 424. says, “All models are wrong, but some are useful.”

As machine learning practitioners, we know there is no perfect model, and we must deal with wrong predictions. We can easily formulate credit decision problems as a binary classification. We can denote credit denial as the negative class (y=0y=0) and credit approval as the positive one (y=1y=1). Of course, we could reverse labels. However, when analyzing fairness, it might change the interpretation of the results, so we must be cautious about which class we consider good. This setup gives us four possible outcomes. Let’s attempt the following credit scoring choice-matching problem to see whether we can correctly match the options:

Match The Answer
Statement
Match With
A

True positive (TP)

The loan has been approved and paid back.

B

True negative (TN)

The loan has been rejected and would be defaulted.

C

False negative (FN)

The loan has been approved and defaulted.

D

False positive (FP)

The loan has been rejected but would be paid back.


Model evaluation

To evaluate the model, we can compute metrics like accuracy, precision, recall, F1-score, and more. However, such evaluation is focused on lenders—they would like to maximize profit. When analyzing fairness, we will also be interested in the borrower’s situation. This does not mean that regular metrics are no longer necessary. Instead, we introduce an additional dimension of analysis.

Assume that a person was told they wouldn’t get the loan in bank A. They go to bank B without success as well. The story continues in bank C. A’s possible explanation is that the person will not be able to pay it back. But there is another one: the person might be systematically discriminated against by the creditworthiness model. This would be especially likely if all banks used a model from the same vendor. Such a situation might seem unlikely initially, but some people can be excluded from society when it happens.

We can work on a similar confusion matrix from the customer’s perspective.

Match The Answer
Statement
Match With
A

True positive (TP)

I didn’t get the loan, but I would have paid it back if I had gotten it.

B

True negative (TN)

I got the loan and I’m not able to pay it back.

C

False negative (FN)

I got the loan and paid it back.

D

False positive (FP)

I didn’t get the loan, but I wouldn’t be able to pay it back anyway.


German Credit dataset

In this section, we are going to build a simple credit risk model using the German Credit dataset. It is pretty old (1994) and contains only 1000 samples, but it will be sufficient for our purposes. Also, it is a common data source for AI Fairness research. The original data is in a kind of obscure format, so we will use a slightly preprocessed version from the OpenML repositoryhttps://www.openml.org/search?type=data&status=active&id=31.

The dataset contains one target variable indicating credit risk, described as either good or bad. There are 20 more features that will help us to build the model (but we will not use all of them). We are going to explore them in more detail in the next section. For now, we need to know that they describe loan parameters (duration, amount, purpose) and customer (personal status, job, employment). We are going to perform a simple exploratory data analysis in the next lesson.