Performance of the Trained Models on Unseen Data
Explore how trained models perform on unseen data.
We'll cover the following...
In reality, we don’t have the labels available for the unseen data. So, we have a separate unseen dataset that we can use to evaluate our trained models. Let’s read the unseen data and see the performance of our trained models.
We can use our custom function to check the missing data columns.
Let’s separate our features and targets and observe the class balance in unseen data.
We have trained the following three models from the previous lesson:
logRon the original data.logR_oson the data after oversampling (creating copies), the minority class.logR_smoteusing SMOTE to develop synthetic data.
Let's check their performance on the unseen data.
Accuracy score
Let’s get the accuracy scores of the individual models first and proceed with logR, which was trained on the original data.
With logR_os, which is trained on the data after oversampling (creating copies), the scores for the minority class will be fetched as follows:
Finally, we’ll use logR_smote, which was trained using SMOTE to create synthetic data.
Changing the performance matrix is helpful, and for general purposes, AUC-ROC is useful.
Area under ROC
Let's start with predicting class probabilities of the logR.
Now, let’s get the class probabilities of the model (that is, logR_os) trained on oversampling (copies only).
Finally, the class probabilities of the logR_smote.
Cohen's kappa coefficient