Exercise: F-test and Univariate Feature Selection
Understand how to perform univariate feature selection using the F-test to evaluate each feature's predictive power individually. Learn to implement this using scikit-learn's f_classif function and SelectPercentile class. This lesson guides you through extracting features with the strongest relationship to the response variable, enhancing your logistic regression modeling skills.
We'll cover the following...
Univariate feature selection using F-test
In this exercise, we’ll use the F-test to examine the relationship between the features and response variable. We will use this method to do what is called univariate feature selection: the practice of testing features one by one against the response variable, to see which ones have predictive power. Perform the following steps to complete the exercise:
-
Our first step in doing the ANOVA F-test is to separate out the features and response as NumPy arrays, taking advantage of the list we created, as well as integer indexing in pandas:
X = df[features_response].iloc[:,:-1].values y = df[features_response].iloc[:,-1].values print(X.shape, y.shape)The output should show the shapes of the features and response:
# (26664, 17) (26664, )There are 17 features, and both the features and response arrays have the same number of samples as expected.
-
Import the
f_classiffunction and feed in the features and response:from sklearn.feature_selection import f_classif [f_stat, f_p_value] = f_classif(X, y)There are two outputs from ...