Challenge Solution Review

Explore how to preprocess data with pandas, apply MinMaxScaler for scaling, use SelectKBest to select top features, build and train a logistic regression model, and evaluate it using the F1 score in this challenge solution review.

We'll cover the following...

Python 3.5

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import sklearn.preprocessing as preprocessing
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
from sklearn.linear_model import LogisticRegression
import sklearn.metrics as metrics
df = pd.read_csv("./challenge1.csv", sep=",", header=0)
y = df.pop("target").values
X = df
minmax = preprocessing.MinMaxScaler()
minmax.fit(X)
X_minmax = minmax.transform(X)
sb = SelectKBest(f_classif, 10)
sb.fit(X_minmax, y)
X_stage2 = sb.transform(X_minmax)
train_x, test_x, train_y, test_y = train_test_split(X_stage2,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=42)
lr = LogisticRegression()
lr.fit(train_x, train_y)
pred_y = lr.predict(test_x)
f1 = metrics.f1_score(test_y, pred_y)
print("The F1-score is {}.".format(f1))

1.Preliminaries

2.Working with Datasets

3.Feature Engineering

4.General Concepts

5.Linear Regression

6.Logistic Regression

7.Support Vector Machine

8.Tree Model and Ensemble Method

9.Unsupervised Learning

10.Deep Learning

11.Others

12.What's Next

Challenge Solution Review