What is the xgb.Booster.get_dump() function in Python?
XGBoost (eXtreme gradient boosting) is an open-source machine-learning library known for its ideal performance in handling structured/tabular data. Based on gradient boosting, it combines the predictions of several weak learners (usually decision trees) to create a strong predictive model.
The xgb.Booster.get_dump() function
The xgb.Booster.get_dump() is a function provided by XGBoost that allows us to obtain the textual representation of the underlying decision trees in the trained booster or model.
It provides transparency into the trained model to examine the individual decision trees’ details, visualize them, and gain insights into how the ensemble model makes predictions.
Note: You can learn more about plotting decision trees from the ensemble here.
Syntax
The syntax for the xgb.Booster.get_dump() function is given below:
dump_list = booster.get_dump(with_stats=False, dump_format='text')
with_statsis an optional parameter set toFalseby default. IfTrue, statistics about each node, such as the number of samples, will be shown in the output.dump_formatis an optional parameter that specifies the output format. It can betext,json, orjson_raw.
Note: Make sure you have the XGBoost library installed. Learn more about the error-free XGBoost installation on your system here.
Code
Let’s demonstrate the use of xgb.Booster.get_dump() with the following code:
import xgboost as xgbimport numpy as np#Creating a synthetic datasetnp.random.seed(42)X = np.random.rand(100, 3)y = np.random.randint(0, 2, 100)#Creating an XGBoost classifiermodel = xgb.XGBClassifier()#Training the model on the datasetmodel.fit(X, y)#Getting the textual representation of the decision treesdump_list = model.get_booster().get_dump()#Printing the outputfor tree_num in range(1):print("Tree {}:\n{}".format(tree_num, dump_list[tree_num]))
Code explanation
Line 1–2: Firstly, we import the necessary
xgbandnpmodules for this code example.Line 5–7: Next, we create a smaller synthetic dataset with 100 samples and 3 features for our convenience using
random.rand()andrandom.randint()functions. The variableyis binary, having values 0 or 1.Line 10: In this line, we create an XGBoost classifier with default hyperparameters and store it in the variable
model.Line 13: Here, we train the model on the entire synthetic dataset
Xandyusing thefitmethod.Line 16: Now, we use the
get_booster()method to access the underlying booster (ensemble) from the trained model andget_dump()is called to get the textual representation of the individual decision trees.Line 19–20: Finally, we print the textual representation of each tree in the ensemble.
Since there are 100 trees in the ensemble, for our convenience and better user experience, we will display the text for the first tree only. Upon execution, the code will give the textual representation of the first decision tree in our trained XGBoost classifier. As you can see, the tree is represented in a human-readable format and shows the split conditions and leaf values.
Conclusion
In conclusion, the xgb.Booster.get_dump() function is an invaluable tool for understanding and visualizing the different decision trees in an XGBoost ensemble model. This gives us a clear perspective on the tree structures by acknowledging the model and supports troubleshooting and testing the model's predictions.
Free Resources