Search⌘ K
AI Features

XGBoost Hyperparameters: Tuning the Learning Rate

Explore how tuning the learning rate hyperparameter in XGBoost affects model performance and convergence. Understand the trade-off between smaller learning rates and training rounds, and learn how to use validation AUC scores and iteration tracking to choose an optimal learning rate for better model accuracy.

Impact of learning rate on model performance

The learning rate is also referred to as eta in the XGBoost documentation, as well as step size shrinkage. This hyperparameter controls how much of a contribution each new estimator will make to the ensemble prediction. If you increase the learning rate, you may reach the optimal model, defined as having the highest performance on the validation set, faster. However, there is the danger that setting it too high will result in boosting steps that are too large. In this case, the gradient boosting procedure may not converge on the optimal model, due to similar issues to those discussed in Exercise, Using Gradient Descent to Minimize a Cost Function, regarding large learning rates in gradient descent. Let’s explore how the learning rate affects model performance on our synthetic data.

The learning rate is a number between zero and one (inclusive of endpoints, although a learning rate of zero is not useful). We make an array of 25 evenly spaced numbers between 0.01 and 1 for the learning rates we’ll test:

learning_rates = np.linspace(start=0.01, stop=1, num=25)
...