Fundamentals of Machine Learning: A Pythonic Introduction/

...

Gradient Boosting: Implementation from Scratch

Get introduced to gradient boosting, its algorithm, and how we can train our data from scratch.

We'll cover the following...

Algorithm
Implementation from scratch
Test your knowledge

Gradient boosting is a machine learning technique that builds a strong predictive model by sequentially adding weak models to the ensemble. It uses gradient descent optimization to minimize the loss function of the model. The term “gradient” refers to the negative gradient of the loss function, which guides the learning process toward reducing errors. Gradient boosting is a versatile technique that can be used for both regression and classification tasks. In this case, we’ll focus on regression and demonstrate how to solve a regression problem using two approaches. Currently, we’re implementing a gradient boosting regressor from scratch.

Algorithm

Initialize the predictor with an initial estimate (usually the mean of the target variable or a constant value $c$ ).
For each iteration:

i. Compute the negative gradient (residuals) of the loss function with respect to the current predictor.

ii. Fit a weak learner (e.g., decision tree) to the negative gradient (residuals).

iii. Update the predictor by adding a scaled version of the weak learner’s predictions to the current predictor.

iv. Repeat steps a to c until a stopping criterion is met (e.g., reaching a maximum number of iterations or achieving a desired level of performance).
Return the final predictor, which is the ensemble of weak learners.

Implementation from scratch

We can generate a synthetic regression dataset using the make_regression function of sklearn as a starting point. By doing this, we can implement gradient boosting on the dataset and compare the results with the gradient boosting regressor provided by scikit-learn library.

Press + to interact

Following is the explanation for the code above:

Lines 1–9: We define a function initial_prediction(y) that takes a list y as input and returns a new list where each element is set to the mean value of the input list, i.e., y.
Lines 11–13: We call the initial_prediction() function on the list of target variables and then print its length and first five elements.

Calculating negative gradients (residuals)

The second step of the algorithm is to calculate negative gradients (also known as residuals) of the loss function with respect to the current predictor. Consider the loss function as the squared error for simplification (but we can choose different errors too):

\begin{align*} Loss &amp;= \frac{1}{2} (y_i - \hat{y}_i)^2 \end{align*}

Here, $y_i$ ...

Course Overview

Supervised Learning

Detect Cyber Intrusion Using Machine Learning

Clustering

Project: Bag of Visual Words

Generalized Linear Regression

Face Recognition Using Kernel Linear Discriminant

Support Vector Machine

Logistic Regression

Ensemble Learning

Early Stage Diabetes Prediction Using Ensemble Learning

Decoding Dimensions: PCA and Autoencoders

Image Reconstruction Using PCA

Image Colorization using Autoencoders

Colorful Face Generation with VAEs

Appendix

Wrapping Up

How to Predict the Traffic Volume Using Machine Learning

Gradient Boosting: Implementation from Scratch

Algorithm

Implementation from scratch

Initialization: Setting up the predictor

Calculating negative gradients (residuals)