XGBoost Hyperparameters: Tuning the Learning Rate

Learn how the learning rate can be adjusted to improve the performance of the random forest model trained with XGBoost.

We'll cover the following

Impact of learning rate on model performance
Try it yourself

Impact of learning rate on model performance

The learning rate is also referred to as eta in the XGBoost documentation, as well as step size shrinkage. This hyperparameter controls how much of a contribution each new estimator will make to the ensemble prediction. If you increase the learning rate, you may reach the optimal model, defined as having the highest performance on the validation set, faster. However, there is the danger that setting it too high will result in boosting steps that are too large. In this case, the gradient boosting procedure may not converge on the optimal model, due to similar issues to those discussed in Exercise, Using Gradient Descent to Minimize a Cost Function, regarding large learning rates in gradient descent. Let’s explore how the learning rate affects model performance on our synthetic data.

The learning rate is a number between zero and one (inclusive of endpoints, although a learning rate of zero is not useful). We make an array of 25 evenly spaced numbers between 0.01 and 1 for the learning rates we’ll test:

learning_rates = np.linspace(start=0.01, stop=1, num=25)

Now we set up a for loop to train a model for each learning rate and save the validation scores in an array. We’ll also track the number of boosting rounds that it takes to reach the best iteration. The next several code blocks should be run together as one cell in a Jupyter Notebook. We start by measuring how long this will take, creating empty lists to store results, and opening the for loop:

%%time 
val_aucs = [] 
best_iters = [] 
for learning_rate in learning_rates:

At each loop iteration, the learning_rate variable will hold successive elements of the learning_rate array. Once inside the loop, the first step is to update the hyperparameters of the model object with the new learning rate. This is accomplished using the set_params method, which we supply with a double asterisk ** and a dictionary mapping hyperparameter names to values. The ** function call syntax in Python allows us to supply an arbitrary number of keyword arguments, also called kwargs, as a dictionary. In this case, we are only changing one keyword argument, so the dictionary only has one item:

xgb_model_1.set_params(**{'learning_rate':learning_rate})

Now that we’ve set the new learning rate on the model object, we train the model using early stopping as before:

xgb_model_1.fit(X_train, y_train, eval_set=eval_set, eval_metric='auc', verbose=False,\
                early_stopping_rounds=30)

After fitting, we obtain the predicted probabilities for the validation set and then use them to calculate the validation ROC AUC. This is added to our list of results using the append method:

val_set_pred_proba_2 = xgb_model_1.predict_proba(X_val)[:,1] 
    val_aucs.append(roc_auc_score(y_val, val_set_pred_proba_2))

Finally, we also capture the number of rounds required for each learning rate:

best_iters.append(int(xgb_model_1.get_booster().attributes()['best_iteration']))

The previous five code snippets should all be run together in one cell. The output should be similar to this:

CPU times: user 1min 23s, sys: 526 ms, total: 1min 24s 
Wall time: 22.2 s

Now that we have our results from this hyperparameter search, we can visualize validation set performance and the number of iterations. Because these two metrics are on different scales, we’ll want to create a dual y axis plot. pandas makes this easy, so first we’ll put all the data into a data frame:

learning_rate_df = pd.DataFrame({'Learning rate':learning_rates, 'Validation AUC':val_aucs,\
'Best iteration':best_iters})

Now we can visualize performance and the number of iterations for different learning rates like this, noting that:

We set the index (set_index) so that the learning rate is plotted on the x axis, and the other columns on the y axis.
The secondary_y keyword argument indicates which column to plot on the right-hand y axis.
The style argument allows us to specify different line styles for each column plotted. -o is a solid line with dots, while --o is a dashed line with dots:

mpl.rcParams['figure.dpi'] = 400 
learning_rate_df.set_index('Learning rate').plot(secondary_y='Best iteration', style=['-o', '--o'])

The resulting plot should look like this:

Get hands-on with 1200+ tech skills courses.

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

XGBoost Hyperparameters: Tuning the Learning Rate

Impact of learning rate on model performance