Exercise: Randomized Grid Search to Tune XGBoost Hyperparameters

Learn how to perform a randomized grid search to explore a large hyperparameter space in XGBoost.

We'll cover the following

XGBoost for randomized grid search

In this exercise, weâ€™ll use a randomized grid search to explore the space of six hyperparameters. A randomized grid search is a good option when you have many values of many hyperparameters youâ€™d like to search over. Weâ€™ll look at six hyperparameters here. If, for example, there were five values for each of these that weâ€™d like to test, weâ€™d need $5^6 = 15,625$ searches. Even if each model fit only took a second, weâ€™d still need several hours to exhaustively search all possible combinations. A randomized grid search can achieve satisfactory results by only searching a random sample of all these combinations. Here, weâ€™ll show how to do this using scikit-learn and XGBoost.

The first step in a randomized grid search is to specify the range of values youâ€™d like to sample from, for each hyperparameter. This can be done by either supplying a list of values, or a distribution object to sample from. In the case of discrete hyperparameters such as max_depth, where there are only a few possible values, it makes sense to specify them as a list. On the other hand, for continuous hyperparameters, such as subsample, that can vary anywhere on the interval (0, 1], we donâ€™t need to specify a list of values. Rather, we can ask that the grid search randomly sample values in a uniform way over this interval. We will use a uniform distribution to sample several of the hyperparameters we consider:

1. Import the uniform distribution class from scipy and specify ranges for all hyperparameters to be searched, using a dictionary. uniform can take two arguments, loc and scale, specifying the lower bound of the interval to sample from and the width of the interval, respectively:

from scipy.stats import uniform
param_grid = {'max_depth':[2,3,4,5,6,7], 'gamma':uniform(loc=0.0, scale=3),\
'min_child_weight':list(range(1,151)), 'colsample_bytree':uniform(loc=0.1, scale=0.9),\
'subsample':uniform(loc=0.5, scale=0.5), 'learning_rate':uniform(loc=0.01, scale=0.5)}


Here, weâ€™ve selected parameter ranges based on experimentation and experience. For example with subsample, the XGBoost documentation recommends choosing values of at least 0.5, so weâ€™ve indicated uniform(loc=0.5, scale=0.5), which means sampling from the interval [0.5, 1].

2. Now that weâ€™ve indicated which distributions to sample from, we need to do the sampling. Scikit-learn offers the ParameterSampler class, which will randomly sample the param_grid parameters supplied and return as many samples as requested (n_iter). We also set RandomState for repeatable results across different runs of the notebook:

from sklearn.model_selection import ParameterSampler
rng = np.random.RandomState(0)
n_iter=300
param_list = list(ParameterSampler(param_grid, n_iter=n_iter, random_state=rng))


We have returned the results in a list of dictionaries of specific parameter values, corresponding to locations in the 6-dimensional hyperparameter space.

Note that in this exercise, we are iterating through 1,000 hyperparameter combinations, which will likely take over 5 minutes. You may wish to decrease this number for faster results.

3. Examine the first item of param_list:

param_list[0]


This should return a combination of six parameter values, from the distributions indicated:

{'colsample_bytree': 0.5939321535345923, 'gamma': 2.1455680991172583, 'learning_rate': 0.31138168803582195, 'max_depth': 5, 'min_child_weight': 104, 'subsample': 0.7118273996694524}

4. Observe how you can set multiple XGBoost hyperparameters simultaneously with a dictionary, using the ** syntax. First create a new XGBoost classifier object for this exercise.

xgb_model_2 = xgb.XGBClassifier( n_estimators=1000, verbosity=1, use_label_encoder=False,\
objective='binary:logistic')
xgb_model_2.set_params(**param_list[0])


The output should show the indicated hyperparameters being set:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1,\
colsample_bytree=0.5939321535345923, gamma=2.1455680991172583, gpu_id=-1, importance_type='gain',\
interaction_constraints='', learning_rate=0.31138168803582195, max_delta_step=0, min_child_weight=104,\
missing=nan, monotone_constraints='()', n_estimators=1000, n_jobs=4, num_parallel_tree=1,\
random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=0.7118273996694524,\
tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=1)


We will use this procedure in a loop to look at all hyperparameter values.

5. The next several steps will be contained in one cell inside a for loop. First, measure the time it will take to do this, create an empty list to save validation AUCs, and then start a counter:

%%time
val_aucs = []
counter = 1

6. Open the for loop, set the hyperparameters, and fit the XGBoost model, similar to the preceding example of tuning the learning rate:

for params in param_list:
#Set hyperparameters and fit model
xgb_model_2.set_params(**params)
xgb_model_2.fit(X_train, y_train, eval_set=eval_set, eval_metric='auc', verbose=False,\
early_stopping_rounds=30)


7. Within the for loop, get the predicted probability and validation set AUC:

   #Get predicted probabilities and save validation ROC AUC
val_set_pred_proba = xgb_model_2.predict_proba(X_val)[:,1]
val_aucs.append(roc_auc_score(y_val, val_set_pred_proba))

8. Because this procedure will take a few minutes, itâ€™s nice to print the progress to the Jupyter Notebook output. We use the Python remainder syntax, %, to print a message every 50 iterations, in other words, when the remainder of counter divided by 50 equals zero. Finally, we increment the counter:

   #Print progress
if counter % 50 == 0:
print('Done with {counter} of {n_iter}'.format( counter=counter, n_iter=n_iter))
counter += 1

9. Assembling steps 5-8 in one cell and running the for loop should give output like this:

Done with 50 of 1000
Done with 100 of 1000
â€¦
Done with 950 of 1000
Done with 1000 of 1000
CPU times: user 24min 20s, sys: 18.9 s, total: 24min 39s
Wall time: 6min 27s

10. Now that we have all the results from our hyperparameter exploration, we need to examine them. We can easily put all the hyperparameter combinations in a data frame, because they are organized as a list of dictionaries. Do this and look at the first few rows:

xgb_param_search_df = pd.DataFrame(param_list)