Tuning Random Forests
Explore the process of tuning random forest models with key hyperparameters like mtry and trees using R's tidymodels package. Understand how random forest design addresses bias-variance tradeoffs and learn practical techniques to enhance model performance through cross-validation and targeted tuning.
We'll cover the following...
Random forest and the bias-variance tradeoff
The random forest algorithm was designed to address aspects of the bias-variance tradeoff without directly tuning the hyperparameters. This differentiates the random forest algorithm from algorithms like CART decision trees and boosted decision trees (e.g., XGBoost). The following illustration maps the random forest algorithm’s design to the bias-variance tradeoff.
Here are a few things to consider:
First, the random forest’s bagging and feature randomization only provide each ensemble tree with limited training data. So, there’s no concern regarding ensemble trees overfitting (i.e., the lower right in the illustration).
Second, because there’s no concern for overfitting, the random forest algorithm sets the CART minbucket hyperparameter to 1. Given the training data provided, this setting allows ensemble trees to grow as deep and complex as the provided training data allows. Deep, complex trees address underfitting (i.e., the upper left in the ...