Ensemble Methods
Learn how to combine multiple ML models to produce more accurate estimations.
We'll cover the following...
Ensemble methods are techniques used in ML to improve the accuracy and stability of models by combining multiple individual models. These models can be of the same or different types, and the combination can be achieved in various ways. Ensemble methods are particularly useful when individual models have a low bias and their predictions are not highly correlated.
There are two main families of ensemble methods:
Averaging methods, on one hand, involve building several base models independently and then averaging their predictions to make the final prediction. This approach typically reduces the variance of the model, making it more stable and accurate. Examples of averaging methods include bagging methods and forests of randomized trees.
Boosting methods, on the other hand, involve building base models sequentially and attempting to reduce the bias of the combined model. This approach involves combining several weak models to produce a more powerful ensemble. Examples of boosting methods include AdaBoost and gradient tree boosting.
Averaging methods
Averaging methods can be split into two subcategories: bagging and stacking.
They are both ensemble techniques used to improve the accuracy and stability of ML models, but they differ in their approach to combining the outputs of multiple models. Let’s look at both in more detail.
Bagging (bootstrap aggregating)
Bagging, short for bootstrap aggregating, involves training multiple instances of the same model on different subsets of the training data and aggregating their outputs to make a final prediction. The individual models in bagging are often identical, and the goal is to reduce variance and improve stability by combining the predictions of multiple models trained on different subsets of the data.
In bagging, each model is trained on a random subset of the training data, selected with replacement, so each model is slightly different. This variation helps to reduce the likelihood of overfitting and improve the overall robustness of the prediction.
The final prediction is typically made by averaging the predictions of all the individual models, although other methods such as median or weighted averaging can also be used. This averaging process helps to smooth out the variations in individual predictions and produce a more stable and reliable prediction.
One way to access bagging on scikit-learn is by using bagging meta-estimators, such as BaggingClassifier
and BaggingRegressor
, which can be used with any base estimator. They work by training multiple copies of the same base estimator on different random subsets of the training data and then combining.
The BaggingClassifier
meta-estimator works by generating multiple subsets of the training data, known as bootstrap samples, and training a separate classifier on each subset. These classifiers are then combined through a bagging process, where the final prediction is determined by averaging the predictions of all the individual classifiers. This approach helps to reduce the variance in the final model and improve its overall accuracy.
Here is some sample code that demonstrates how to use the BaggingClassifier
algorithm in scikit-learn to classify a dataset:
from sklearn.ensemble import BaggingClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_split# Load the breast cancer datasetdata = load_breast_cancer()X = data.datay = data.target# Split the dataset into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)# Create a BaggingClassifierclf = BaggingClassifier(base_estimator=DecisionTreeClassifier(),n_estimators=10,random_state=1)# Train the modelclf.fit(X_train, y_train)# Evaluate the modelscore = clf.score(X_test, y_test)# Print the accuracy scoreprint("Accuracy:", score)
Line 7: We load our classification dataset, where we have data on breast cancer and the goal ...