Ensemble methods in Python: Boosting

Ensemble learning is a machine learning technique that combines the prediction of multiple models to create a more accurate and robust overall prediction.

Boosting is an ensemble learning technique that aims to improve a model’s predictive performance by combining the strengths of multiple weak learners (also called base models)Weak learners or base models in machine learning are models that perform slightly better than random chance but lack high accuracy individually. Ensemble methods like boosting sequentially train weak learners, with each subsequent model addressing the mistakes of the previous ones. . Unlike baggingBagging (Bootstrap Aggregating) ensemble combines predictions from multiple models trained on different bootstrap samples of the dataset to improve overall performance and reduce overfitting., which builds independent models in parallel, boosting sequentially builds a sequence of models. Each subsequent model focuses on correcting the errors of the previous ones, leading to a more accurate and robust overall model.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
# Load and split the data
boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)
# Implement gradient boosting regressor
gradient_boosting_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
gradient_boosting_model.fit(X_train, y_train)
# Predict and evaluate
y_pred = gradient_boosting_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error: {:.2f}%".format(mse))

Explanation

Lines 1–4: These lines import the required libraries.
Line 7: This line loads the Boston dataset from sklearn and stores it in the data variable.
Line 8: This line splits the dataset into train and test.
Lines 11–12: Here, we create a GradientBoostingRegressor with 50 base models and fit the boosting model on the training data.
Line 15: The trained model is used to make predictions on the test data.
Line 16: The code calculates the mean_squared_error of the model's predictions by comparing them to the true labels in the test set. The mean_squared_error is printed as a percentage.
Line 17: The output line prints the mean_squared_error between the actual and predicted housing prices, providing a measure of the model’s performance.

Unlock your potential: Ensemble learning series, all in one place!

To continue your exploration of ensemble learning, check out our series of Answers below:

What is ensemble learning?
Understand the concept of combining multiple models to improve predictions.
Ensemble methods in Python: Averaging
Learn how averaging methods can boost model accuracy and stability.
Ensemble methods in Python: Bagging
Discover the power of bagging in reducing variance and enhancing prediction performance.
Ensemble methods in Python: Boosting
Dive into boosting techniques that improve weak models by focusing on mistakes.
Ensemble methods in Python: Stacking
Understand how stacking combines multiple models to make better predictions.
Ensemble methods in Python: Max voting
Explore the max voting method to combine classifier predictions and increase accuracy.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Ensemble methods in Python: Boosting

How to implement boosting using Python

1. Import the libraries

2. Load the dataset

3. Implement boosting

4. Predict and evaluate

Example

Explanation