Ensemble methods in Python: Max voting

Ensemble methods in machine learning combine the strengths of multiple models for enhanced performance. Max voting, a foundational ensemble technique, involves aggregating predictions from multiple models and selecting the most frequent class as the final prediction. This straightforward yet effective approach leverages the diversity of individual models to enhance overall predictive accuracy.

Max voting algorithm
Max voting algorithm

How to implement max voting using Python

Let’s look at the steps required to implement the max voting algorithm in Python.

Import the libraries

The first step is to import the required libraries.

from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

Load the dataset

The next step is to load the dataset. We will use the breast cancer dataset provided by the sklearn library.

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)

Define the base models

The next step is to choose the base models. The max voting classifier uses multiple models. We will use RandomForestClassifier and GradientBoostingClassifier for this example.

rf_model = RandomForestClassifier(n_estimators=10, random_state=42)
gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)

Implement max-voting

We will now create an instance for the VotingClassifier and fit the training data to train the model.

max_voting_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='hard')
max_voting_model.fit(X_train, y_train)

Predict and evaluate

Now, we will make the predictions on the test set and calculate accuracy.

y_pred = averaging_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Code example

The following code shows the steps outlined above to implement the max voting ensemble classifier in Python:

from sklearn.ensemble import VotingClassifier
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score
# Load and split the dataset
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)
# Define base models
rf_model = RandomForestClassifier(n_estimators=10, random_state=42)
gb_model = GradientBoostingClassifier(n_estimators=10, random_state=42)
# Create an ensemble using max voting
max_voting_model = VotingClassifier(estimators=[('rf', rf_model), ('gb', gb_model)], voting='hard')
max_voting_model.fit(X_train, y_train)
# Predict and evaluate
y_pred = max_voting_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Code explanation

  • Lines 1–5: These lines import the required libraries.

  • Line 8: This line loads the breast cancer dataset from sklearn and stores it in the cancer variable.

  • Line 9: This line splits the dataset into train and test.

  • Lines 12–13: We define RandomForestClassifier and GradientBoostingClassifier as the base models for the VotingClassifier.

  • Lines 16–17: Here, we create a VotingClassifier with specified base models.

  • Line 20: The trained model is used to make predictions on the test data.

  • Lines 21–22: The code calculates the accuracy of the model’s predictions by comparing them to the true labels in the test set. The accuracy is printed as a percentage.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved