Ensemble methods in Python: Boosting
Ensemble learning is a machine learning technique that combines the prediction of multiple models to create a more accurate and robust overall prediction.
Boosting is an ensemble learning technique that aims to improve a model’s predictive performance by combining the strengths of multiple
How to implement boosting using Python
Follow the steps below to implement the boosting algorithm in Python:
1. Import the libraries
The first step is to import the required libraries, as shown in the code below:
from sklearn.ensemble import GradientBoostingRegressorfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import load_bostonfrom sklearn.metrics import mean_squared_error
2. Load the dataset
The next step is to load the dataset. We will use the Boston dataset provided by the sklearn library. The Boston house-prices dataset consists of 506 rows and 13 columns. The train_test_split function divides the dataset into training and testing data.
boston = load_boston()X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)
3. Implement boosting
We will now create an instance for the GradientBoostingRegressor and fit the training data to train the model. The n_estimators parameter dictates the number of trees in the forest, and random_state ensures reproducibility. Adjusting hyperparameters like n_estimators, max_depth, and learning_rate allows fine-tuning the model’s performance.
gradient_boosting_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)gradient_boosting_model.fit(X_train, y_train)
4. Predict and evaluate
Now, we will make the predictions on the test set and calculate mean_squared_error.
y_pred = gradient_boosting_model.predict(X_test)mse = mean_squared_error(y_test, y_pred)print("Mean Squared Error: {:.2f}%".format(mse))
Example
The following code shows how we can implement the boosting ensemble classifier in Python:
from sklearn.ensemble import GradientBoostingRegressorfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import load_bostonfrom sklearn.metrics import mean_squared_error# Load and split the databoston = load_boston()X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)# Implement gradient boosting regressorgradient_boosting_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)gradient_boosting_model.fit(X_train, y_train)# Predict and evaluatey_pred = gradient_boosting_model.predict(X_test)mse = mean_squared_error(y_test, y_pred)print("Mean Squared Error: {:.2f}%".format(mse))
Explanation
Lines 1–4: These lines import the required libraries.
Line 7: This line loads the Boston dataset from
sklearnand stores it in thedatavariable.Line 8: This line splits the dataset into train and test.
Lines 11–12: Here, we create a
GradientBoostingRegressorwith 50 base models and fit the boosting model on the training data.Line 15: The trained model is used to make predictions on the test data.
Line 16: The code calculates the
mean_squared_errorof the model's predictions by comparing them to the true labels in the test set. Themean_squared_erroris printed as a percentage.Line 17: The output line prints the
mean_squared_errorbetween the actual and predicted housing prices, providing a measure of the model’s performance.
Unlock your potential: Ensemble learning series, all in one place!
To continue your exploration of ensemble learning, check out our series of Answers below:
What is ensemble learning?
Understand the concept of combining multiple models to improve predictions.Ensemble methods in Python: Averaging
Learn how averaging methods can boost model accuracy and stability.Ensemble methods in Python: Bagging
Discover the power of bagging in reducing variance and enhancing prediction performance.Ensemble methods in Python: Boosting
Dive into boosting techniques that improve weak models by focusing on mistakes.Ensemble methods in Python: Stacking
Understand how stacking combines multiple models to make better predictions.Ensemble methods in Python: Max voting
Explore the max voting method to combine classifier predictions and increase accuracy.
Free Resources