XGBoost, Light GBM and CatBoost

In this lesson, you will learn the implementation in Python for XGBoost, LightGBM, and CatBoost. They are popular Boosting algorithms being used in the field and deliver very good results in competitions.

We will discuss some famous methods that use Boosting to construct a good model. These models include XgBoost, LightGBM, and CatBoost. We will also cover how they are implemented in Python and their various parameters.

XGBoost

XGBoost stands for Extreme Gradient Boosting. It is an advanced implementation of gradient boosting algorithms. It is ten times faster than the other gradient boosting algorithms. There are some benefits of XGBoost algorithms as indicated in its documentation.

  • It can be used to solve a variety of problems, including regression, classification, and user-defined predictive problems.
  • It can be run on distributed environments like Hadoop.
  • It supports Regularization to combat overfitting.
  • It is good at handling sparse data (missing values).
  • It supports parallelization to utilize all the cores on a system.
  • It supports out-of-core computing for datasets that do not fit into memory. .

Implementation in Python

We will be using the XGBoost module to implement XGBoost on a sample problem. We will be using XGBClassifier for the classification problem which is the implementation of Scikit Learn API for XGBoost. We can pass in the following parameters for XGBoost.

  • max_depth: It is the maximum tree depth for base learners.

  • learning_rate : It is the learning rate α\alpha. We have seen its intuition in the previous lessons.

  • n_jobs: It is the number of parallel threads to run for XGBoost to increase its speed.

  • reg_alpha: It is the value being specified for L1 regularization.

  • reg_lambda: It is the value being specified for L2 regularization.

  • missing: It is the value in the dataset that needs to be present as a missing dataset like np.nan.

Get hands-on with 1200+ tech skills courses.