Exercise: Ftest and Univariate Feature Selection
Learn how to select the univariate features using the Ftest.
We'll cover the following
Univariate feature selection using Ftest
In this exercise, weâ€™ll use the Ftest to examine the relationship between the features and response variable. We will use this method to do what is called univariate feature selection: the practice of testing features one by one against the response variable, to see which ones have predictive power. Perform the following steps to complete the exercise:

Our first step in doing the ANOVA Ftest is to separate out the features and response as NumPy arrays, taking advantage of the list we created, as well as integer indexing in pandas:
X = df[features_response].iloc[:,:1].values y = df[features_response].iloc[:,1].values print(X.shape, y.shape)
The output should show the shapes of the features and response:
# (26664, 17) (26664, )
There are 17 features, and both the features and response arrays have the same number of samples as expected.

Import the
f_classif
function and feed in the features and response:from sklearn.feature_selection import f_classif [f_stat, f_p_value] = f_classif(X, y)
There are two outputs from
f_classif
: the Fstatistic and the pvalue, for the comparison of each feature to the response variable. Letâ€™s create a new DataFrame containing the feature names and these outputs, to facilitate our inspection. One way to specify a new DataFrame is by using a dictionary, with key/value pairs of column names and the data to be contained in each column. We show the DataFrame sorted (ascending) on pvalue. 
Use this code to create a DataFrame of feature names, Fstatistics, and pvalues, and show it sorted on pvalue:
f_test_df = pd.DataFrame({'Feature':features_response[:1], 'F statistic':f_stat,\ 'p value':f_p_value}) f_test_df.sort_values('p value')
The output should look like this:
Get handson with 1200+ tech skills courses.