# Exercise: F-test and Univariate Feature Selection

Learn how to select the univariate features using the F-test.

We'll cover the following

## Univariate feature selection using F-test

In this exercise, weâ€™ll use the F-test to examine the relationship between the features and response variable. We will use this method to do what is called univariate feature selection: the practice of testing features one by one against the response variable, to see which ones have predictive power. Perform the following steps to complete the exercise:

1. Our first step in doing the ANOVA F-test is to separate out the features and response as NumPy arrays, taking advantage of the list we created, as well as integer indexing in pandas:

X = df[features_response].iloc[:,:-1].values
y = df[features_response].iloc[:,-1].values
print(X.shape, y.shape)


The output should show the shapes of the features and response:

# (26664, 17) (26664, )


There are 17 features, and both the features and response arrays have the same number of samples as expected.

2. Import the f_classif function and feed in the features and response:

from sklearn.feature_selection import
f_classif
[f_stat, f_p_value] = f_classif(X, y)


There are two outputs from f_classif: the F-statistic and the p-value, for the comparison of each feature to the response variable. Letâ€™s create a new DataFrame containing the feature names and these outputs, to facilitate our inspection. One way to specify a new DataFrame is by using a dictionary, with key/value pairs of column names and the data to be contained in each column. We show the DataFrame sorted (ascending) on p-value.

3. Use this code to create a DataFrame of feature names, F-statistics, and p-values, and show it sorted on p-value:

f_test_df = pd.DataFrame({'Feature':features_response[:-1], 'F statistic':f_stat,\
'p value':f_p_value})
f_test_df.sort_values('p value')


The output should look like this:

Get hands-on with 1200+ tech skills courses.