Ensemble learning is a machine learning technique that combines the prediction of multiple models to create a more accurate and robust overall prediction.
Boosting is an ensemble learning technique that aims to improve a model’s predictive performance by combining the strengths of multiple
Follow the steps below to implement the boosting algorithm in Python:
The first step is to import the required libraries, as shown in the code below:
from sklearn.ensemble import GradientBoostingRegressorfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import load_bostonfrom sklearn.metrics import mean_squared_error
The next step is to load the dataset. We will use the Boston dataset provided by the sklearn
library. The Boston house-prices dataset consists of 506 rows and 13 columns. The train_test_split function divides the dataset into training and testing data.
boston = load_boston()X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)
We will now create an instance for the GradientBoostingRegressor
and fit the training data to train the model. The n_estimators
parameter dictates the number of trees in the forest, and random_state
ensures reproducibility. Adjusting hyperparameters like n_estimators
, max_depth
, and learning_rate
allows fine-tuning the model’s performance.
gradient_boosting_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)gradient_boosting_model.fit(X_train, y_train)
Now, we will make the predictions on the test set and calculate mean_squared_error
.
y_pred = gradient_boosting_model.predict(X_test)mse = mean_squared_error(y_test, y_pred)print("Mean Squared Error: {:.2f}%".format(mse))
The following code shows how we can implement the boosting
ensemble classifier in Python:
from sklearn.ensemble import GradientBoostingRegressorfrom sklearn.model_selection import train_test_splitfrom sklearn.datasets import load_bostonfrom sklearn.metrics import mean_squared_error# Load and split the databoston = load_boston()X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)# Implement gradient boosting regressorgradient_boosting_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)gradient_boosting_model.fit(X_train, y_train)# Predict and evaluatey_pred = gradient_boosting_model.predict(X_test)mse = mean_squared_error(y_test, y_pred)print("Mean Squared Error: {:.2f}%".format(mse))
Lines 1–4: These lines import the required libraries.
Line 7: This line loads the Boston dataset from sklearn
and stores it in the data
variable.
Line 8: This line splits the dataset into train and test.
Lines 11–12: Here, we create a GradientBoostingRegressor
with 50 base models and fit the boosting model on the training data.
Line 15: The trained model is used to make predictions on the test data.
Line 16: The code calculates the mean_squared_error
of the model's predictions by comparing them to the true labels in the test set. The mean_squared_error
is printed as a percentage.
Line 17: The output line prints the mean_squared_error
between the actual and predicted housing prices, providing a measure of the model’s performance.