Gradient Boosting: Implementation Using Scikit-learn

Understand how to test and evaluate a gradient boosting regressor by applying a trained model to test data. Learn to implement predictions from scratch and compare results with scikit-learn's GradientBoostingRegressor. Discover how performance metrics like mean squared error validate model accuracy and observe the convergence and efficiency advantages of scikit-learn's approach.

We'll cover the following...

Training of gradient boosting regressor
Testing of gradient boosting regressor
Gradient boosting using sklearn
- Performance evaluation using MSE
Comparison with sklearn
Test your knowledge
Conclusion

In this lesson, we’ll look into the testing phase of gradient boosting, building upon the trained model that we previously developed. Our main objective is to utilize this trained model to make predictions on a test dataset. To validate the performance of our implementation, we will compare our results with those obtained from GradientBoostingRegressor provided by the scikit-learn library.

Training of gradient boosting regressor

Before proceeding to the testing phase, we’ll consolidate all the code widgets of the previous lesson to review and understand the progress we’ve made so far. Then, we’ll evaluate the effectiveness of our trained model on unseen data.

Python 3.10.4

import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=10, noise=1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)
#print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
# initialization of random predictor 
def initial_prediction(y):
    mean = []
    a = np.mean(y)
    for i in range(0, len(y)):
        mean.append(a)
    return mean
# Function for residual calculation
def Residual(actual, pred):
    residual = []
    N = len(actual)
    for i in range(0, N):
        res = actual[i] - pred[i]
        residual.append(res)
    return residual
def GradientBoosting_fit(X, y, iter, alpha):
    # Step I: Random initialization of Predictor, f_k
    y_hat = initial_prediction(y)
    mu_y = y_hat[0]
    hypothesis = []
    # Calculation of Residual 
    residual = Residual(y, y_hat)
    #print("res1", residual)
    y_h=[]
    # Step II
    for i in range(0, iter):
        # Creating instance of h_k
        regressor = DecisionTreeRegressor(random_state = 0,max_depth=3)
        regressor.fit(X, residual)
        hypothesis.append(regressor)
        # Predictions of second model h_k
        h_new = regressor.predict(X)
        # Ppdating the predictor
        y_hat = y_hat + (alpha * h_new)
        #print("prediction", y_hat)
        # Updating of residual
        residual = Residual(y, y_hat)
        
        y_h.append(y_hat)
    # Prediction of ensemble model on training dataset     
    print("Prediction on training data: ", y_hat[:5])
    # Step III
    return hypothesis, alpha, mu_y, y_h
hypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)
print("alpha: ",alpha)
print("mu_y:  ",mu_y)
print("No of trained models: ",len(hypothesis))

Testing of gradient boosting regressor

We’re going to write a function named GB_predict that takes several parameters: test_data (the data on which we want to check the performance of our trained model), list_of_models (list of weak learners with trained parameters), alpha (learning rate), and c (initial predictor, which in our case is the mean of target variable). It aims to make predictions on a test dataset using the trained gradient boosting model. The function iterates over the ensemble of decision tree models, updates the predictions based on each model’s output, and returns the final predictions.

Python 3.10.4

hypothesis,alpha,mu_y, y_h = GradientBoosting_fit(X_train, y_train, iter=50, alpha=0.1)
def GB_predict(test_data,list_of_models,alpha,c):
  mu=[]
  errors=[]
  for i in range(0,len(test_data)):
    mu.append(c)
  #print("mu_",mu)
  for model in list_of_models:
    mu += alpha * model.predict(test_data)
    error = mse(y_test,mu)
    errors.append(error)
  return mu,errors
prediction,errors = GB_predict(X_test,hypothesis,alpha,mu_y)
print("Prediction on test data",prediction)
print("MSE after 50 iterations is: ",errors[-1])

1.Course Overview

2.Supervised Learning

Project

3.Clustering

Mini Project

4.Generalized Linear Regression

Mini Project

5.Support Vector Machine

6.Logistic Regression

7.Ensemble Learning

Mini Project

8.Decoding Dimensions: PCA and Autoencoders

Mini Project

Mini Project

Mini Project

9.Appendix

10.Wrapping Up

Project

Gradient Boosting: Implementation Using Scikit-learn

Training of gradient boosting regressor

Testing of gradient boosting regressor