Training Decision Trees: Greedy Algorithm and Stopping Criteria

Explore why decision tree training uses a greedy algorithm and how we customize the training by changing parameters.

A greedy algorithm

There is no guarantee that a decision tree trained by the process described previously will be the best possible decision tree for finding leaf nodes with the lowest impurity. This is because the algorithm used to train decision trees is what is called a greedy algorithm. In this context, this means that at each opportunity to split a node, the algorithm is looking for the best possible split at that point in time, without any regard for the fact that the opportunities for later splits are being affected.

For example, consider the following hypothetical scenario: the best initial split for the training data of the case study involves PAY_1, as we’ve seen before (see graph below). But what if we instead split on BILL_AMT_1, and then make subsequent splits on PAY_1 in the next level? Even though the initial split on BILL_AMT_1 is not the best one available at first, it is possible that the end result will be better if the tree is grown this way. The algorithm has no way of finding solutions like this if they exist, because it only considers the best possible split at each node and not possible future splits.

Get hands-on with 1200+ tech skills courses.