Training Decision Trees: Greedy Algorithm and Stopping Criteria

Explore why decision tree training uses a greedy algorithm and how we customize the training by changing parameters.

We'll cover the following...

A greedy algorithm
Different stopping criteria and other options
Default values for the parameters

A greedy algorithm

There is no guarantee that a decision tree trained by the process described previously will be the best possible decision tree for finding leaf nodes with the lowest impurity. This is because the algorithm used to train decision trees is what is called a greedy algorithm. In this context, this means that at each opportunity to split a node, the algorithm is looking for the best possible split at that point in time, without any regard for the fact that the opportunities for later splits are being affected.

For example, consider the following hypothetical scenario: the best initial split for the training data of the case study involves PAY_1, as we’ve seen before (see graph below). But what if we instead split on BILL_AMT_1, and then make subsequent splits on PAY_1 in the next level? Even though the initial split on BILL_AMT_1 is not the best one available at first, it is possible that the end result will be better if the tree is grown this way. The algorithm has no way of finding ...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Training Decision Trees: Greedy Algorithm and Stopping Criteria

A greedy algorithm