Search⌘ K
AI Features

Introduction to the Dataset for the Course

Explore the loan approval dataset designed for binary classification to predict customer eligibility for home loans. Learn about the dataset features and how this real-world data will be used throughout the course to develop machine learning models and apply hyperparameter optimization techniques.

Problem statement

A company known as Dream Housing Finance offers a wide variety of home loans. They maintain a presence in all of the urban, semi-urban, and rural regions of the country. The process begins with the customer submitting an application for a home loan, and it is followed by the company’s efforts to cross-check the information provided in the application and then verify the customer’s eligibility for the loan.

The company wants to be able to automatically determine, in real-time, if a customer is eligible for the loan they’ve applied for based on the information they provide in their online loan applications.

They have provided a dataset to automate this process, which will identify the customer segments that are qualified for loan amounts. This will allow them to specifically target these customers.

The loan approval dataset

In this course, we’ll utilize the loan dataset, which is a binary classification dataset consisting of loan details and the status of different customers. The aim is to develop an ML model that predicts if a customer’s request for a loan can be approved or not.

Binary classification is a type of supervised learning in ML where the goal is to classify input data into one of two possible categories. The categories are typically represented as:

  • 0 and 1

  • True and false

  • Positive and negative

Note: These categories can also be presented in different ways.

The ML algorithm is trained on a labeled dataset, where each data point is linked to the correct category label. The objective of the ML algorithm is to learn a decision boundary that separates the two classes. Once the ML model is trained, it can be used to predict the category of new, unseen data points.

Binary classification is used in many different ways, such as to detect spam, fraud, and medical diagnoses.

Here are a few sample rows of the loan approval dataset:

Loan Approval Dataset

Loan_ID

Gender

Married

Dependents

Education

Self_Employed

ApplicantIncome

CoApplicantIncome

LoanAmount

Loan_Amount_Term

Credit_History

Property_Area

Loan_Status

LP001002

Male

No

0

Graduate

No

5849

0

267

360

1

Urban

Y

LP001003

Male

Yes

1

Graduate

No

4583

1508

128

360

1

Rural

N

LP001013

Female

No

0

Graduate

No

3510

0

76

360

0

Urban

N

Features of the dataset

The dataset has the following columns:

  • Loan_ID: Unique loan ID

  • Gender: Male/ Female

  • Married: Applicant married (Y/N)

  • Dependents: Number of dependents

  • Education: Applicant’s education (Graduate/ Undergraduate)

  • Self_Employed: Self-employed (Y/N)

  • ApplicantIncome: Applicant income

  • CoapplicantIncome: Coapplicant income

  • LoanAmount: Loan amount in thousands

  • Loan_Amount_Term: Term of the loan in months

  • Credit_History: Credit history meets guidelines

  • Property_Area: Urban/ Semi urban/ Rural

  • Loan_Status: Loan approved (Y/N)

The loan status column has two classes: Y or N.

  • Y: If the loan is approved, it signifies a “Yes.”

  • N: If the loan is not approved, it signifies a “No.”

We’ll develop an ML model using this dataset. The model will be able to classify if a customer’s request for a loan can be approved or not. Therefore, we will solve this classification-based ML challenge in this course. We’ll also apply different hyperparameter optimization techniques to improve the performance of the ML model.

Note: This course’ll only use the “labeledTrainData” dataset, which has 614 customer loan details and 13 column features presented above.