Exercise: Randomized Grid Search to Tune XGBoost Hyperparameters

Learn how to perform a randomized grid search to explore a large hyperparameter space in XGBoost.

We'll cover the following...

XGBoost for randomized grid search
Try it yourself

XGBoost for randomized grid search

In this exercise, we’ll use a randomized grid search to explore the space of six hyperparameters. A randomized grid search is a good option when you have many values of many hyperparameters you’d like to search over. We’ll look at six hyperparameters here. If, for example, there were five values for each of these that we’d like to test, we’d need $5^6 = 15,625$ searches. Even if each model fit only took a second, we’d still need several hours to exhaustively search all possible combinations. A randomized grid search can achieve satisfactory results by only searching a random sample of all these combinations. Here, we’ll show how to do this using scikit-learn and XGBoost.

The first step in a randomized grid search is to specify the range of values you’d like to sample from, for each hyperparameter. This can be done by either supplying a list of values, or a distribution object to sample from. In the case of discrete hyperparameters such as max_depth, where there are only a few possible values, it makes sense to specify them as a list. On the other hand, for continuous hyperparameters, such as subsample, that can vary anywhere on the interval (0, 1], we don’t need to specify a list of values. Rather, we can ask that the grid search randomly sample values in a uniform way over this interval. We will use a uniform distribution to sample several of the hyperparameters we consider:

Import the uniform distribution class from scipy and specify ranges for all hyperparameters to be searched, using a dictionary. uniform can take two arguments, loc and scale, specifying the lower bound of the interval to sample from and the width of the interval, respectively:
```
from scipy.stats import uniform 
param_grid = {'max_depth':[2,3,4,5,6,7], 'gamma':uniform(loc=0.0, scale=3),\
'min_child_weight':list(range(1,151)), 'colsample_bytree':uniform(loc=0.1, scale=0.9),\
'subsample':uniform(loc=0.5,
```

...

Introduction

Data Exploration and Cleaning

(Challenge) Exploring Remaining Financial Features in Dataset

Introduction to scikit-learn and Model Evaluation

Fake News Detection Using Scikit-learn

(Challenge) Logistic Regression and Precision-Recall Curve

Details of Logistic Regression and Feature Extraction

(Challenge) Logistic Regression Model and Coefficients

The Bias-Variance Trade-Off

(Challenge) Cross-Validation and Feature Engineering

Decision Trees and Random Forests

(Challenge) Cross-Validation Grid Search with Random Forest

Gradient Boosting, XGBoost, and SHAP Values

(Challenge) XGBoost and SHAP Explanation for Case Study Data

Predict Frog Toxicity with Python and XGBoost

Test Set Analysis, Financial Insights, and Delivery to the Client

(Challenge) Deriving Financial Insights

Appendix

Exercise: Randomized Grid Search to Tune XGBoost Hyperparameters

XGBoost for randomized grid search