Exercise: Randomized Grid Search to Tune XGBoost Hyperparameters

Learn how to perform a randomized grid search to explore a large hyperparameter space in XGBoost.

XGBoost for randomized grid search

In this exercise, we’ll use a randomized grid search to explore the space of six hyperparameters. A randomized grid search is a good option when you have many values of many hyperparameters you’d like to search over. We’ll look at six hyperparameters here. If, for example, there were five values for each of these that we’d like to test, we’d need 56=15,6255^6 = 15,625 searches. Even if each model fit only took a second, we’d still need several hours to exhaustively search all possible combinations. A randomized grid search can achieve satisfactory results by only searching a random sample of all these combinations. Here, we’ll show how to do this using scikit-learn and XGBoost.

The first step in a randomized grid search is to specify the range of values you’d like to sample from, for each hyperparameter. This can be done by either supplying a list of values, or a distribution object to sample from. In the case of discrete hyperparameters such as max_depth, where there are only a few possible values, it makes sense to specify them as a list. On the other hand, for continuous hyperparameters, such as subsample, that can vary anywhere on the interval (0, 1], we don’t need to specify a list of values. Rather, we can ask that the grid search randomly sample values in a uniform way over this interval. We will use a uniform distribution to sample several of the hyperparameters we consider:

  1. Import the uniform distribution class from scipy and specify ranges for all hyperparameters to be searched, using a dictionary. uniform can take two arguments, loc and scale, specifying the lower bound of the interval to sample from and the width of the interval, respectively:

    from scipy.stats import uniform 
    param_grid = {'max_depth':[2,3,4,5,6,7], 'gamma':uniform(loc=0.0, scale=3),\
    'min_child_weight':list(range(1,151)), 'colsample_bytree':uniform(loc=0.1, scale=0.9),\
    'subsample':uniform(loc=0.5, scale=0.5), 'learning_rate':uniform(loc=0.01, scale=0.5)}

    Here, we’ve selected parameter ranges based on experimentation and experience. For example with subsample, the XGBoost documentation recommends choosing values of at least 0.5, so we’ve indicated uniform(loc=0.5, scale=0.5), which means sampling from the interval [0.5, 1].

  2. Now that we’ve indicated which distributions to sample from, we need to do the sampling. Scikit-learn offers the ParameterSampler class, which will randomly sample the param_grid parameters supplied and return as many samples as requested (n_iter). We also set RandomState for repeatable results across different runs of the notebook:

    from sklearn.model_selection import ParameterSampler 
    rng = np.random.RandomState(0) 
    param_list = list(ParameterSampler(param_grid, n_iter=n_iter, random_state=rng))

    We have returned the results in a list of dictionaries of specific parameter values, corresponding to locations in the 6-dimensional hyperparameter space.

    Note that in this exercise, we are iterating through 1,000 hyperparameter combinations, which will likely take over 5 minutes. You may wish to decrease this number for faster results.

  3. Examine the first item of param_list:


    This should return a combination of six parameter values, from the distributions indicated:

    {'colsample_bytree': 0.5939321535345923, 'gamma': 2.1455680991172583, 'learning_rate': 0.31138168803582195, 'max_depth': 5, 'min_child_weight': 104, 'subsample': 0.7118273996694524}
  4. Observe how you can set multiple XGBoost hyperparameters simultaneously with a dictionary, using the ** syntax. First create a new XGBoost classifier object for this exercise.

    xgb_model_2 = xgb.XGBClassifier( n_estimators=1000, verbosity=1, use_label_encoder=False,\

    The output should show the indicated hyperparameters being set:

    XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1,\
    colsample_bytree=0.5939321535345923, gamma=2.1455680991172583, gpu_id=-1, importance_type='gain',\
    interaction_constraints='', learning_rate=0.31138168803582195, max_delta_step=0, min_child_weight=104,\
    missing=nan, monotone_constraints='()', n_estimators=1000, n_jobs=4, num_parallel_tree=1,\
    random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=0.7118273996694524,\
    tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=1)

    We will use this procedure in a loop to look at all hyperparameter values.

  5. The next several steps will be contained in one cell inside a for loop. First, measure the time it will take to do this, create an empty list to save validation AUCs, and then start a counter:

    val_aucs = [] 
    counter = 1
  6. Open the for loop, set the hyperparameters, and fit the XGBoost model, similar to the preceding example of tuning the learning rate:

    for params in param_list: 
        #Set hyperparameters and fit model 
        xgb_model_2.fit(X_train, y_train, eval_set=eval_set, eval_metric='auc', verbose=False,\
  7. Within the for loop, get the predicted probability and validation set AUC:

       #Get predicted probabilities and save validation ROC AUC 
       val_set_pred_proba = xgb_model_2.predict_proba(X_val)[:,1] 
       val_aucs.append(roc_auc_score(y_val, val_set_pred_proba)) 
  8. Because this procedure will take a few minutes, it’s nice to print the progress to the Jupyter Notebook output. We use the Python remainder syntax, %, to print a message every 50 iterations, in other words, when the remainder of counter divided by 50 equals zero. Finally, we increment the counter:

       #Print progress 
       if counter % 50 == 0: 
          print('Done with {counter} of {n_iter}'.format( counter=counter, n_iter=n_iter)) 
       counter += 1
  9. Assembling steps 5-8 in one cell and running the for loop should give output like this:

    Done with 50 of 1000 
    Done with 100 of 1000 
    Done with 950 of 1000 
    Done with 1000 of 1000 
    CPU times: user 24min 20s, sys: 18.9 s, total: 24min 39s 
    Wall time: 6min 27s
  10. Now that we have all the results from our hyperparameter exploration, we need to examine them. We can easily put all the hyperparameter combinations in a data frame, because they are organized as a list of dictionaries. Do this and look at the first few rows:

    xgb_param_search_df = pd.DataFrame(param_list) 

    The output should look like this:

Get hands-on with 1200+ tech skills courses.