Decision Trees

Explore how decision trees are used in classification tasks, especially in finance for predicting loan defaults. Understand the greedy approach to building trees, selecting the best features to split data, decision stumps, and stopping criteria to manage tree size and accuracy.

We'll cover the following...

Greedy approach to build the tree

Decision trees are commonly used in financial domains due to their ability to solve and explain prediction problems. They are also used as a module in other algorithms to solve more complex algorithms like bagging and boosting.

To understand the decision trees, let’s look at the problem of a car loan defaulter prediction. When we want to get a loan from the bank, they ask some questions and derive answers from our past credit history. Questions may relate to:

Monthly income
Personal information
Previous loans
Current properties

Greedy approach to build the tree

We will use a greedy approach to build the tree. First, we start with all the data and features. Then we select a feature and split the data based on that feature value. Like in the above example, we have selected monthly income. We split the data based on monthly income. If we get a collection where all instances belong to the same class, there is no point in creating the tree further. If it contains both class data, we keep on building the decision tree with other features.

Find best feature to split

We have selected monthly income in the example above. But what is the best feature at any point to split the data? We determine this using a decision stump.

A decision stump is a single level decision tree that divides the data based on a condition of a feature value. For example, our dataset has one hundred records. 70 are classified as “Yes”, and 30 are classified as “No”. We create a decision stump with monthly income. If the income is greater than $12K, we put the data in one bucket, otherwise we put data in another bucket. We make the decision based on the highest instance count. If the count of “Yes” is higher, we predict the majority class to be “Yes”. If the count of “No” is higher, we predict the majority class to be “No”.

After splitting, we calculate classification errors. First, calculate at the root level.

Classification error at root = No of incorrect examples / total number of examples = 30/100 = 0.30

Now, after splitting, can we reduce the classification error?

Classification error on child nodes = No of incorrect examples/total number of examples = (20+5)/ 100 = 25/100 = 0.25

So, after splitting, the error is reduced. Now, we will see if we can reduce the error with another splitting.

Consider splitting by age.

Here, classification error is by age = (10 + 3)/100 = 0.13

This is less than the monthly income. So, at this point, age is a better feature than monthly income.

So, at each split, we select the feature which gives the minimum classification error.

Stopping tree building recursion

We can stop tree building at any node if only a single class data remains. However, this can lead to a large tree. We can stop by using another condition. If we used all features of the data, on any path, we can stop building the tree further. Using this, we can avoid building large trees and limit them by the number of features.

1.Are You Ready to Become a Data Scientist?

2.Python Basics

3.Python Libraries

4.More Data Science Tools

5.Data Structures and Algorithms - I

6.Data Structures and Algorithms - II

7.Statistics and Probability

8.Feature Engineering

9.Basics of Machine Learning

10.Regression

11.Classification

12.Unsupervised Learning

13.Advanced Topics in Machine Learning

14.Conclusion

Mock Interview

Decision Trees

Greedy approach to build the tree