How to implement gradient boosting in Python

Gradient boosting

Gradient boosting is a technique used when building machine learning models. It is commonly called an ensemble model because it combines decision trees to build a more robust and effective algorithm. This is where the term booster comes in. For classification models, the GradientBoostingClassifier is used, while the GradientBoostingRegressor is used for regression models. Both can be imported from the scikit-learn library.

Given that we've created a dataset that has been split into X and y variables, we can implement the gradient boosting regression as shown below:

Code example

import pandas as pd
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split


#creating a list of values for years_experience & salary
years_experience = [1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0,3.2,3.2,3.7,3.9,4.0,4.0,4.1,4.5,4.9,5.1,5.3,5.9,6.0,6.8,7.1,7.9,8.2,8.7,9.0,9.5,9.6,10.3,10.5]
salary = [39343.00, 46205.00, 37731.00, 43525.00, 39891.00, 56642.00, 60150.00, 54445.00, 64445.00, 57189.00, 63218.00, 55794.00, 56957.00, 57081.00,61111.00,67938.00,66029.00,83088.00,81363.00,93940.00,91738.00,98273.00,101302.00,113812.00,109431.00,105582.00,116969.00,112635.00,122391.00, 121872.00 ]

# Create a dataframe from lists
df = pd.DataFrame({'years_experience': years_experience, 'salary': salary})

# Split the data into training and testing sets
X = df[['years_experience']]
y = df['salary']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit a GradientBoostingRegressor model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
model.fit(X_train, y_train)

# Make a prediction on the test data
y_pred = model.predict(X_test)

# Print the R-squared value
r2 = model.score(X_test, y_test)
print("Mean_absoloute_score is: ", mean_absolute_error(y_pred, y_test))
print("R_squared score is: ",r2_score(y_pred, y_test))

Code explanation

The code above demonstrates how to implement gradient boosting using the sckit-learn library:

Lines 1–6: We import the necessary libraries.
Lines 10–11: We assign a list of values to variables, years_experience and salary.
Line 14: We create a DataFrame from the lists created.
Line 17–18: We split the dataset into the independent, X and dependent, y variables.
Line 19: We split the X and y variables into train and test sizes. The test size chosen is 0.3 with a random state set to 41. The train and test sizes for the independent variable, X is reshaped as we are working with a single column.
Line 22: We create an instance of GradientBoostingRegressor.
Line 23: Training of the model.
Line 26: We make predictions on the test data using the gbr.predict() command.
Lines 29–31: We measure the r2_score and mean_absolute_error of our model and print the outputs to the console.

We implement GradientBoostingClassifier in the same way as GradientBoostingRegressor in the steps outlined in the code above.

Free Resources