Decision trees are tree data structures that are generated using learning algorithms for the purpose of Classification and Regression. One of the most common problem when learning a decision tree is to learn the optimal size of the resulting tree that leads to a better accuracy of the model. A tree that has too many branches and layers can result in overfitting of the training data. Pruning a decision tree helps to prevent overfitting the training data so that our model generalizes well to unseen data. Pruning a decision tree means to remove a subtree that is redundant and not a useful split and replace it with a leaf node. Decision tree pruning can be divided into two types: pre-pruning and post-pruning.
Pre-pruning, also known as Early Stopping Rule, is the method where the subtree construction is halted at a particular node after evaluation of some measure. These measures can be the Gini Impurity or the Information Gain. In pre-pruning, we evaluate the pruning condition based on the above measures at each node. Examples of pruning conditions include
informationGain(Attr)> minGain or
treeDepth == MaxDepth. If the condition is satisfied, we prune the subtree. That means we replace the decision node with a leaf node. Otherwise, we continue building the tree using our decision tree algorithm.
Pre-pruning has the advantage of being faster and more efficient as it avoids generating overly complex subtrees which overfit the training data. However, in pre-pruning, the growth of the tree is stopped prematurely by our stopping condition.
As the name suggests, post-pruning means to prune after the tree is built. You grow the tree entirely using your decision tree algorithm and then you prune the subtrees in the tree in a bottom-up fashion. You start from the bottom decision node and, based on measures such as Gini Impurity or Information Gain, you decide whether to keep this decision node or replace it with a leaf node. For example, say we want to prune out subtrees that result in least information gain. When deciding the leaf node, we want to know what leaf our decision tree algorithm would have created if it didn’t create this decision node.
There are many pruning algorithms out there; below are three examples of pruning algorithms.
We can prune our decision tree by using information gain in both post-pruning and pre-pruning. In pre-pruning, we check whether information gain at a particular node is greater than minimum gain. In post-pruning, we prune the subtrees with the least information gain until we reach a desired number of leaves.
REP belongs to the Post-Pruning category. In REP, pruning is performed with the help of a validation set. In REP, all nodes are evaluated for pruning in a bottom up fashion. A node is pruned if the resulting pruned tree performs no worse than the original tree on the validation set. The subtree at the node is replaced with a leaf node which is assigned the most common class.
Cost-complexity pruning also falls under the post-pruning category. Cost-complexity pruning works by calculating a Tree Score based on Residual Sum of Squares (RSS) for the subtree, and a Tree Complexity Penalty that is a function of the number of leaves in the subtree. The Tree Complexity Penalty compensates for the difference in the number of leaves. Numerically, Tree Score is defined as follows:
(alpha) is a hyperparameter we find using cross-validation, and T is the number of leaves in the subtree.
We calculate the Tree Score for all subtrees in the decision tree, and then pick the subtree with the lowest Tree Score. However, we can observe from the equation that the value of alpha determines the choice of the subtree. The value of alpha is found using cross-validation. We repeat the above process for different values of alpha, which gives us a sequence of trees. The value of alpha that on average gives the lowest Tree Score is the final value of alpha. Finally, our pruned decision tree will be tree corresponding to the final value of alpha.
View all Courses