mk1.tar.gz

data_preprocessing

dataset

decision_tree_classification

feature_extraction

feature_selection

gradient_boost

kmeans

knn_classification

sk_lr_classification

sk_lr_regression

sk_metrics

sk_missing_value

sk_naive_bayes_classification

sk_nn

sk_parameter_search

sk_pca

sk_pipeline

sk_rf

sk_tsne

sk_svm_classification

jupyter_job

python_updated

sk_naive_bayes_classification-copy

Scikit-Learn is a powerful library that provides a handful of supervised and unsupervised learning algorithms. If you’re serious about having a career in machine learning, then scikit-learn is a must know.

In this course, you will start by learning the various built-in datasets that scikit-learn offers, such as iris and mnist. You will then learn about feature engineering and more specifically, feature selection, feature extraction, and dimension reduction.

In the latter half of the course, you will dive into linear and logistic regression where you’ll work through a few challenges to test your understanding. Lastly, you will focus on unsupervised learning and deep learning where you’ll get into k-means clustering and neural networks.

By the end of this course, you will have a great new skill to add to your resume, and you’ll be ready to start working on your own projects that will utilize scikit-learn.

Hands-on Machine Learning with Scikit-Learn

## What is logistic regression?

`Logistic regression` is a Machine Learning classification algorithm that is used to predict the probability of certain classes based on some dependent variables. In short, the `logistic regression` model computes a sum of the input feature (in most cases, there is a `bias` term), and calculates the `logistic` of the result.

The output of a `logistic regression` is always between (0, 1), which is suitable for a binary classification task. The higher the value, the higher the probability that the current sample is classified as class=1, and vice versa.

$$h_{\theta}(x) = \frac{1}{1+e^{-\theta x}}$$

As the formula above shows, $\theta$ is the parameter we want to learn or train or optimize and $x$ is the input data. The output is the prediction value when the value is closer to `1`, which means the instance is more likely to be a positive sample(**y=1**). If the value is closer to `0`, this means the instance is more likely to be a negative sample(**y=0**). 

To optimize our task, we need to define a loss function(cost or objective function) for this task. In `logistic regression`, we use the `log-likelihood loss` function.

$$J(\theta)= - \frac{1}{m} \sum_{i=1}^{m}(y^{i}*log(p^{i})+(1-y^{i})log(1-p^{i}))$$

$m$ is the number of samples in the training data. $y^{i}$ is the label of the i-th sample, $p^{i}$ is the prediction value of the i-th sample. When the current sample's label is **1**, then the second term of the formula is **0**. We hope the larger the first term, the better, and vice versa. Finally, we add the loss of all samples, take the average, and add a negative sign. Our goal is to minimize the $J(\theta)$. When $J(\theta)$ is smaller, it means that the model fits better on the data set. 

There is no closed-form method to find $\theta$. To achieve this goal, we need to use some optimization algorithms, such as gradient descent. Since $J(\theta)$ is a convex function, the gradient descent is guaranteed to find a global minimum.

Let's start coding.

# What is logistic regression?

`Logistic regression` is a Machine Learning classification algorithm that is used to predict the probability of certain classes based on some dependent variables. In short, the `logistic regression` model computes a sum of the input feature (in most cases, there is a `bias` term), and calculates the `logistic` of the result.

The output of a `logistic regression` is always between (0, 1), which is suitable for a binary classification task. The higher the value, the higher the probability that the current sample is classified as class=1, and vice versa.

$$h_{\theta}(x) = \frac{1}{1+e^{-\theta x}}$$

As the formula above shows, $\theta$ is the parameter we want to learn or train or optimize and $x$ is the input data. The output is the prediction value when the value is closer to `1`, which means the instance is more likely to be a positive sample(**y=1**). If the value is closer to `0`, this means the instance is more likely to be a negative sample(**y=0**). 

To optimize our task, we need to define a loss function(cost or objective function) for this task. In `logistic regression`, we use the `log-likelihood loss` function.

$$J(\theta)= - \frac{1}{m} \sum_{i=1}^{m}(y^{i}*log(p^{i})+(1-y^{i})log(1-p^{i}))$$

$m$ is the number of samples in the training data. $y^{i}$ is the label of the i-th sample, $p^{i}$ is the prediction value of the i-th sample. When the current sample's label is **1**, then the second term of the formula is **0**. We hope the larger the first term, the better, and vice versa. Finally, we add the loss of all samples, take the average, and add a negative sign. Our goal is to minimize the $J(\theta)$. When $J(\theta)$ is smaller, it means that the model fits better on the data set. 

There is no closed-form method to find $\theta$. To achieve this goal, we need to use some optimization algorithms, such as gradient descent. Since $J(\theta)$ is a convex function, the gradient descent is guaranteed to find a global minimum.

Let's start coding.

In this lesson, we will use logistic regression to do the classification task.

Logistic Regression

Preliminaries

Working with Datasets

Feature Engineering

General Concepts

Linear Regression

Logistic Regression

Support Vector Machine

Tree Model and Ensemble Method

Unsupervised Learning

Deep Learning

Others

What's Next

Logistic Regression

What is logistic regression?