Ensemble methods in Python: Bagging

Ensemble methods in machine learning leverage the power of combining multiple models to enhance overall performance. This approach is particularly effective when individual models may have limitations or biases. One prominent ensemble technique is Bagging (also known as bootstrap aggregating).

Bagging aims to reduce overfitting and varianceVariance in machine learning refers to the sensitivity of a model to fluctuations in the training data, indicating how much the model's predictions vary with different training sets. by training multiple instances of a base model on different subsets of the training data. Each subset is obtained through bootstrap samplingBootstrap sampling is a statistical technique where subsets of a dataset are repeatedly drawn with replacement, allowing for the estimation of the variability and uncertainty associated with a sample statistic or model parameters., randomly selecting data points with replacement. The final prediction is often an average or a vote from the individual models.

from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
# Load and split the data
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=10)  # Changed random_state
# Use RandomForestClassifier with max_features='sqrt' for randomness
base_model = RandomForestClassifier(n_estimators=10, max_depth=3, max_features='sqrt', random_state=42)  # We can adjust hyperparameters
# Implement bagging with different base models for diversity
bagging_model = BaggingClassifier(base_model, n_estimators=50, random_state=20)
bagging_model.fit(X_train, y_train)
# Predict and evaluate
y_pred = bagging_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Explanation:

Lines 1–4: These lines import the required libraries.
Line 7: This line loads the dataset from sklearn and stores it in the data variable.
Line 8: This line splits the dataset into train and test.
Line 11: We define RandomForestClassifier as this line’s base model for bagging.
Lines 14--15: Here, we create a BaggingClassifier with 50 base models and fit the bagging model on the training data. The BaggingClassifier handles the bootstrap sampling internally when fitting the model.
Line 18: The trained model is used to make predictions on the test data.
Lines 19–20: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.

Unlock your potential: Ensemble learning series, all in one place!

To continue your exploration of ensemble learning, check out our series of Answers below:

What is ensemble learning?
Understand the concept of combining multiple models to improve predictions.
Ensemble methods in Python: Averaging
Learn how averaging methods can boost model accuracy and stability.
Ensemble methods in Python: Bagging
Discover the power of bagging in reducing variance and enhancing prediction performance.
Ensemble methods in Python: Boosting
Dive into boosting techniques that improve weak models by focusing on mistakes.
Ensemble methods in Python: Stacking
Understand how stacking combines multiple models to make better predictions.
Ensemble methods in Python: Max voting
Explore the max voting method to combine classifier predictions and increase accuracy.

Ensemble methods in Python: Bagging

How to implement bagging using Python

1. Import the libraries

2. Load the dataset

3. Define the base model

4. Implement bagging

5. Predict and evaluate

Example

Explanation: