What is the xgb.Booster.get_dump() function in Python?

XGBoost (eXtreme gradient boosting) is an open-source machine-learning library known for its ideal performance in handling structured/tabular data. Based on gradient boosting, it combines the predictions of several weak learners (usually decision trees) to create a strong predictive model.

The `xgb.Booster.get_dump()` function

The xgb.Booster.get_dump() is a function provided by XGBoost that allows us to obtain the textual representation of the underlying decision trees in the trained booster or model.

It provides transparency into the trained model to examine the individual decision trees’ details, visualize them, and gain insights into how the ensemble model makes predictions.

Note: You can learn more about plotting decision trees from the ensemble here.

Syntax

The syntax for the xgb.Booster.get_dump() function is given below:

Code explanation

Line 1–2: Firstly, we import the necessary xgb and np modules for this code example.
Line 5–7: Next, we create a smaller synthetic dataset with 100 samples and 3 features for our convenience using random.rand() and random.randint() functions. The variable y is binary, having values 0 or 1.
Line 10: In this line, we create an XGBoost classifier with default hyperparameters and store it in the variable model.
Line 13: Here, we train the model on the entire synthetic dataset X and y using the fit method.
Line 16: Now, we use the get_booster() method to access the underlying booster (ensemble) from the trained model and get_dump() is called to get the textual representation of the individual decision trees.
Line 19–20: Finally, we print the textual representation of each tree in the ensemble.

Since there are 100 trees in the ensemble, for our convenience and better user experience, we will display the text for the first tree only. Upon execution, the code will give the textual representation of the first decision tree in our trained XGBoost classifier. As you can see, the tree is represented in a human-readable format and shows the split conditions and leaf values.

Conclusion

In conclusion, the xgb.Booster.get_dump() function is an invaluable tool for understanding and visualizing the different decision trees in an XGBoost ensemble model. This gives us a clear perspective on the tree structures by acknowledging the model and supports troubleshooting and testing the model's predictions.

What is the xgb.Booster.get_dump() function in Python?

The xgb.Booster.get_dump() function

Syntax

Code

Code explanation

Conclusion

The `xgb.Booster.get_dump()` function