How to implement gradient boosting in Python
Gradient boosting
Gradient boosting is a technique used when building machine learning models. It is commonly called an ensemble model because it combines decision trees to build a more robust and effective algorithm. This is where the term booster comes in. For classification models, the GradientBoostingClassifier is used, while the GradientBoostingRegressor is used for regression models. Both can be imported from the scikit-learn library.
Given that we've created a dataset that has been split into X and y variables, we can implement the gradient boosting regression as shown below:
Code example
import pandas as pdimport numpy as npfrom sklearn.ensemble import GradientBoostingRegressorfrom sklearn.metrics import mean_absolute_errorfrom sklearn.metrics import r2_scorefrom sklearn.model_selection import train_test_split#creating a list of values for years_experience & salaryyears_experience = [1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0,3.2,3.2,3.7,3.9,4.0,4.0,4.1,4.5,4.9,5.1,5.3,5.9,6.0,6.8,7.1,7.9,8.2,8.7,9.0,9.5,9.6,10.3,10.5]salary = [39343.00, 46205.00, 37731.00, 43525.00, 39891.00, 56642.00, 60150.00, 54445.00, 64445.00, 57189.00, 63218.00, 55794.00, 56957.00, 57081.00,61111.00,67938.00,66029.00,83088.00,81363.00,93940.00,91738.00,98273.00,101302.00,113812.00,109431.00,105582.00,116969.00,112635.00,122391.00, 121872.00 ]# Create a dataframe from listsdf = pd.DataFrame({'years_experience': years_experience, 'salary': salary})# Split the data into training and testing setsX = df[['years_experience']]y = df['salary']X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Fit a GradientBoostingRegressor modelmodel = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)model.fit(X_train, y_train)# Make a prediction on the test datay_pred = model.predict(X_test)# Print the R-squared valuer2 = model.score(X_test, y_test)print("Mean_absoloute_score is: ", mean_absolute_error(y_pred, y_test))print("R_squared score is: ",r2_score(y_pred, y_test))
Code explanation
The code above demonstrates how to implement gradient boosting using the sckit-learn library:
Lines 1–6: We import the necessary libraries.
Lines 10–11: We assign a list of values to variables,
years_experienceandsalary.Line 14: We create a DataFrame from the lists created.
Line 17–18: We split the dataset into the independent,
Xand dependent,yvariables.Line 19: We split the
Xandyvariables into train and test sizes. The test size chosen is 0.3 with a random state set to 41. The train and test sizes for the independent variable,Xis reshaped as we are working with a single column.Line 22: We create an instance of
GradientBoostingRegressor.Line 23: Training of the model.
Line 26: We make predictions on the test data using the
gbr.predict()command.Lines 29–31: We measure the
r2_scoreandmean_absolute_errorof our model and print the outputs to the console.
We implement GradientBoostingClassifier in the same way as GradientBoostingRegressor in the steps outlined in the code above.
Free Resources