pip install lightgbm
We can also use the conda
command to install the lightgbm
module in Python:
conda install -c conda-forge lightgbm
Here are the steps to implement a LightGBM classifier.
The first step is to import the required libraries to use the functionality provided by these libraries.
from sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitfrom sklearn import metricsfrom sklearn.metrics import accuracy_score, classification_reportimport lightgbm as lgb
The next step is to load the dataset. We’ll use the breast cancer dataset provided by the sklearn
library.
data = load_breast_cancer()X = data.datay = data.target
The LGBMClassifier
class constructor takes in several parameters. There are four necessary parameters, along with several optional parameters that can be used for further customization.
Argument | Description |
| Specifies the number of boosting rounds or iterations |
| Sets the maximum number of leaves in one tree (It’s important for controlling the complexity of the model and avoiding overfitting) |
| Defines the increment size during each iteration as it converges towards minimizing the loss function |
| Specifies the learning task and the corresponding objective function |
Now, we’ll use LGBMClassifier
to fit the dataset for training the model. We perform a train-test split on the dataset (X
and y
) with a test size of 20%. Then, we initialize a LGBMClassifier
model with specified hyperparameters such as 100
estimators, a maximum depth of 6
, a learning rate of 0.1
, and a binary
classification objective. Finally, the model is trained on the training data (X_train
, y_train
).
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)model = lgb.LGBMClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, objective='binary')model.fit(X_train, y_train)
Now, we’ll use our trained classifier to make a prediction using X_test
.
y_pred = model.predict(X_test)
Finally, let’s evaluate the performance of our classifier.
accuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))report = classification_report(y_test, y_pred)print("Classification Report:\n", report)
The following code shows how we can use a LightGBM classifier in Python:
from sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitfrom sklearn import metricsfrom sklearn.metrics import accuracy_score, classification_reportimport lightgbm as lgb# Load the breast cancer datasetdata = load_breast_cancer()# Extract the features (X) and target (y)X = data.datay = data.target# Splitting the datasetX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)# Training the modelmodel = lgb.LGBMClassifier(n_estimators=100, max_depth=6, learning_rate=0.1, objective='binary', verbosity=-1)# Fit the model on the training datamodel.fit(X_train, y_train)# Make predictions on the test datay_pred = model.predict(X_test)# Calculate and print the accuracy of the modelaccuracy = accuracy_score(y_test, y_pred)print("Accuracy: {:.2f}%".format(accuracy * 100))# Print the classification report of the modelreport = classification_report(y_test, y_pred)print("Classification Report:\n", report)
Line 8: We load the breast cancer dataset from sklearn
and store it in the data
variable.
Lines 11–12: We extract the feature matrix X
and the target vector y
from the loaded dataset. X
contains the input data, and y
contains the binary classification labels.
Line 15: We split the dataset into training (X_train
and y_train
) and testing (X_test
and y_test
) sets using the train_test_split()
function. Here, 20% of the data is reserved for testing, and 80% is used for training.
Line 18: We create an instance of the LGBMClassifier
class with specified parameters.
Line 21: We train the model on the training data using the fit()
method.
Line 24: The trained model is used to make predictions on the test data.
Lines 27–28: We calculate the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.
Lines 31–32: We generate and print the classification report for the model.