Search⌘ K
AI Features

Solution: Model Evaluation

Explore how to evaluate machine learning classification models using the F1 score and k-fold cross-validation. Understand how to apply logistic regression, k-nearest neighbors, and decision trees, compare their performance, and select the best model based on average scores across folds.

We'll cover the following...

There are multiple possible solutions for the model selection coding challenge, depending on the cross-validation methods we choose, but the important thing is to do the following:

  1. Choose an appropriate metric for a classification task.

  2. Use a cross-validation method to select the best model.

Here is one possible solution:

Python 3.8
import numpy as np
import pandas as pd
preprocessed = pd.read_csv("preprocessed.csv")
# Define X (model features) and y (target variable)
X = preprocessed[X_var]
y = preprocessed[y_var]
# Three algorithms
classifiers = [
LogisticRegression(penalty='l2', C=10),
KNeighborsClassifier(
n_neighbors=4, metric='euclidean', weights='distance'
),
DecisionTreeClassifier(
max_depth=5, min_samples_split=10
)
]
# Import evaluation metric
from sklearn.metrics import f1_score
# Initialize k-fold cross-validation
from sklearn.model_selection import KFold
k = 3
kf = KFold(n_splits=k)
# Perform k-fold cross-validation for each model
for model in classifiers:
# Initialize a list to store the F1 scores for each fold
f1_scores = []
for train_index, test_index in kf.split(X):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
# Train the model
model.fit(X_train, y_train)
# Calculate F1 score for the current fold
y_test_pred = model.predict(X_test)
f1_scores.append(f1_score(y_test, y_test_pred))
print(f"Average F1 Score for {type(model).__name__}:", np.mean(f1_scores))
  • Lines 10–17: We initialize three different classification ...