Search⌘ K
AI Features

Decision Trees

Explore how decision trees are used in classification tasks, especially in finance for predicting loan defaults. Understand the greedy approach to building trees, selecting the best features to split data, decision stumps, and stopping criteria to manage tree size and accuracy.

We'll cover the following...

Decision trees are commonly used in financial domains due to their ability to solve and explain prediction problems. They are also used as a module in other algorithms to solve more complex algorithms like bagging and boosting.

To understand the decision trees, let’s look at the problem of a car loan defaulter prediction. When we want to get a loan from the bank, they ask some questions and derive answers from our past credit history. Questions may relate to:

  • Monthly income
  • Personal information
  • Previous loans
  • Current properties

Let’s say a person has a monthly income of $10K. Their age is 37. They have not taken any previous loans and do not own any property. A decision tree with the past data (from other customers) can be created like this:

This is a very basic example. In the real world, a lot of other and complex data points are considered.

So, according to the data, this person $(Income<12K,Age>35)12K, Age>35) will get the loan.

Getting the correct decision tree is hard and has exponential time complexity. We can use the greedy approach to create a decision tree from the data.

Greedy approach to build the tree

We will use a greedy approach to build the tree. First, we start with all the data and features. Then we select a feature and split the data based on that feature value. Like in the above example, we have selected monthly income. We split the data based on monthly income. If we get a collection where all instances belong to the same class, there is no point in creating the tree further. If it contains both class data, we keep on building the decision tree with other features.

The algorithm can be given like this:

  1. Start with all the data. The tree is initially empty.
  2. Find the best feature to split the data for every split
  3. If the split has the same class, make the prediction.
  4. If the split has more than one class, keep splitting the best feature (go to step 2).

In this algorithm, we have two major conditions:

  • What is the best feature to split any point?
  • How do we stop the splitting? Instead of only one class remaining, can we stop the recursion early?

Find best feature to split

We have selected monthly income in the example above. But what is the best feature at any point to split the data? We determine this using a decision stump.

A decision stump is a single level decision tree that divides the data based on a condition of a feature value. For example, our dataset has one hundred records. 70 are classified as “Yes”, and 30 are classified as “No”. We create a decision stump with monthly income. If the income is greater than $12K, we put the data in one bucket, otherwise we put data in another bucket. We make the decision based on the highest instance count. If the count of “Yes” is higher, we predict the majority class to be “Yes”. If the count of “No” is higher, we predict the majority class to be “No”.

After splitting, we calculate classification errors. First, calculate at the root level.

Classification error at root = No of incorrect examples / total number of examples = 30/100 = 0.30

Now, after splitting, can we reduce the classification error?

Classification error on child nodes = No of incorrect examples/total number of examples = (20+5)/ 100 = 25/100 = 0.25

So, after splitting, the error is reduced. Now, we will see if we can reduce the error with another splitting.

Consider splitting by age.

Here, classification error is by age = (10 + 3)/100 = 0.13

This is less than the monthly income. So, at this point, age is a better feature than monthly income.

So, at each split, we select the feature which gives the minimum classification error.

Stopping tree building recursion

We can stop tree building at any node if only a single class data remains. However, this can lead to a large tree. We can stop by using another condition. If we used all features of the data, on any path, we can stop building the tree further. Using this, we can avoid building large trees and limit them by the number of features.

1.

Can we use decision trees for multi-class prediction problems?

Show Answer
Did you find this helpful?

We can split and take the decision based on the majority class.

In the above class, we used only a one-level decision tree, but it can go further.

1.

How do we split the data with numerical value features?

Show Answer
Did you find this helpful?

Quiz: Complexity of node splitting

1.

What is the complexity of the numerical value split node selection using the above method? (Given that total number of records are n.)

A.

O(logn)

B.

O(n)

C.

O(nlogn)

D.

O(n2)


1 / 1
1.

What do the boundaries of a decision tree look like?

Show Answer
Did you find this helpful?

Consider the problem where we have two features, X and Y, and two classes. The initial distribution looks like this:

Now, we select the feature X and split the data based on a threshold. The decision boundary will look like this:

Again, we take another feature Y and split the remaining data. The boundaries look like this:

We can do it again and again, but it always remains parallel to the axis.

1.

What is the maximum depth parameter of decision tree training?

Show Answer
1 / 5