Gradient Boosting: Implementation from Scratch
Explore the step-by-step implementation of gradient boosting for regression. Understand how to initialize predictions, calculate residuals, train weak learners, and iteratively improve the model using gradient descent. This lesson guides you through building a strong ensemble model from scratch, providing insights into the role of learning rate, residuals, and iterations in boosting performance.
Gradient boosting is a machine learning technique that builds a strong predictive model by sequentially adding weak models to the ensemble.
It uses
In this lesson, we’ll focus on regression and demonstrate how to solve a regression problem using two approaches.
Algorithm
Think of gradient boosting as a team of assistants, where each new assistant’s sole job is to correct the mistakes of the entire team that came before it.
-
Make an initial guess: The process begins with a simple, initial prediction for every data point, typically using the average value of the target variable.
-
Measure the errors (residuals): Calculate the error (or residual) for every data point by finding the difference between the actual value and the current prediction. This error is the new target that the next model must learn.
-
Train a new, simple model: A new, weak predictive model (like a small decision tree) is trained specifically to predict the errors identified in the previous step.
-
Update the overall prediction: The prediction from the new, error-correcting model is added to the previous overall prediction, often scaled by a small learning rate to ensure cautious, steady improvement.
-
Repeat the cycle: Steps 2 through 4 are repeated for a specified number of iterations, with each new model focusing on correcting the remaining errors of the combined ensemble.
-
Form the final predictor: The final, powerful prediction is the sum of the initial guess and all the sequential error-correcting models trained during the process.
Implementation from scratch
We can generate a synthetic regression dataset using the make_regression function of sklearn as a starting point. By doing this, we can implement gradient boosting on the dataset and compare the results with the gradient boosting regressor provided by scikit-learn library.
Then, we split the dataset using the train_test_split function of sklearn.model_selection so that we can train our dataset using a gradient boosting regressor.
Initialization: Setting up the predictor
First, we initialize our predictor with a constant , which in most cases is the mean of the target variable. So, we have to create a list or array having the same size as our input array, where each element of the list contains the constant . Here’s the code that initializes a random predictor by setting all predicted values to the mean of the input list:
Following is the explanation for the code above:
-
Lines 4–9: We define a function
initial_prediction(y)that takes a listyas input and returns a new list where each element is set to the mean value of the input list, i.e.,y. -
Lines 11–13: We call the
initial_prediction()function on the list of target variables and then print its length and first five elements.
Calculating negative gradients (residuals)
The second step of the algorithm is to calculate negative gradients (also known as residuals) of the loss function with respect to the current predictor. Consider the loss function as the squared error for simplification (but we can choose different errors too):
Here, ...