Feature Selection
Go through the feature selection process, considering different methods.
We'll cover the following...
Before we move forward, dropping all the error columns from our data frame might be a good idea. They may not be beneficial. At the same time, we can be innovative and look for the column name to see if they have the word “error” using the for loop and make a selection.
So, we have a list of columns, cols_, that do not include the error columns. We can separate them now.
Let’s explore this data a bit more. Feature selection is necessary, especially when working with many features in our dataset.
Feature selection
Let’s introduce some common ways to select features based on statistical measures. Analysis of variance (ANOVA) and chi-square (chi2) are recommended for feature selection in classification problems. Before we move on, we need to import SelectKBest(), which will return the requested number of top features based on suggested statistics, such as chi2 or ANOVA f-value.
In scikit-learn, chi2 computes chi-squared statistics between each nonnegative feature and class.