How to implement a LightGBM classifier in Python

widget

LightGBM is an efficient and high-performance open-source gradient-boosting framework for various machine learning tasks, including classification, regression, and ranking.

Installation

The lightgbm module can be easily installed using the pip command as follows:

pip install lightgbm

We can also use the conda command to install the lightgbm module in Python:

conda install -c conda-forge lightgbm

Implementing a LightGBM classifier

Here are the steps to implement a LightGBM classifier.

Import the libraries

The first step is to import the required libraries to use the functionality provided by these libraries.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score, classification_report
import lightgbm as lgb

Load the dataset

The next step is to load the dataset. We’ll use the breast cancer dataset provided by the sklearn library.

data = load_breast_cancer()
X = data.data
y = data.target

Understand the parameters

The LGBMClassifier class constructor takes in several parameters. There are four necessary parameters, along with several optional parameters that can be used for further customization.

LGBMClassifier Parameters

Argument

Description

n_estimators

Specifies the number of boosting rounds or iterations

max_depth

Sets the maximum number of leaves in one tree (It’s important for controlling the complexity of the model and avoiding overfitting)

learning_rate

Defines the increment size during each iteration as it converges towards minimizing the loss function

objective

Specifies the learning task and the corresponding objective function

Train the model

Now, we’ll use LGBMClassifier to fit the dataset for training the model. We perform a train-test split on the dataset (X and y) with a test size of 20%. Then, we initialize a LGBMClassifier model with specified hyperparameters such as 100 estimators, a maximum depth of 6, a learning rate of 0.1, and a binary classification objective. Finally, the model is trained on the training data (X_train, y_train).

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
model = lgb.LGBMClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, objective='binary')
model.fit(X_train, y_train)

Make a prediction

Now, we’ll use our trained classifier to make a prediction using X_test.

y_pred = model.predict(X_test)

Evaluate the model

Finally, let’s evaluate the performance of our classifier.

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
report = classification_report(y_test, y_pred)
print("Classification Report:\n", report)

Example

The following code shows how we can use a LightGBM classifier in Python:

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score, classification_report
import lightgbm as lgb
# Load the breast cancer dataset
data = load_breast_cancer()
# Extract the features (X) and target (y)
X = data.data
y = data.target
# Splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
# Training the model
model = lgb.LGBMClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, objective='binary', verbosity=-1)
# Fit the model on the training data
model.fit(X_train, y_train)
# Make predictions on the test data
y_pred = model.predict(X_test)
# Calculate and print the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))
# Print the classification report of the model
report = classification_report(y_test, y_pred)
print("Classification Report:\n", report)

Code explanation:

  • Line 8: We load the breast cancer dataset from sklearn and store it in the data variable.

  • Lines 11–12: We extract the feature matrix X and the target vector y from the loaded dataset. X contains the input data, and y contains the binary classification labels.

  • Line 15: We split the dataset into training (X_train and y_train) and testing (X_test and y_test) sets using the train_test_split() function. Here, 20% of the data is reserved for testing, and 80% is used for training.

  • Line 18: We create an instance of the LGBMClassifier class with specified parameters.

  • Line 21: We train the model on the training data using the fit() method.

  • Line 24: The trained model is used to make predictions on the test data.

  • Lines 27–28: We calculate the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.

  • Lines 31–32: We generate and print the classification report for the model.

Copyright ©2024 Educative, Inc. All rights reserved