# Logistic Regression

In this lesson, we will use logistic regression to do the classification task.

## What is logistic regression?

`Logistic regression`

is a Machine Learning classification algorithm that is used to predict the probability of certain classes based on some dependent variables. In short, the `logistic regression`

model computes a sum of the input feature (in most cases, there is a `bias`

term), and calculates the `logistic`

of the result.

The output of a `logistic regression`

is always between (0, 1), which is suitable for a binary classification task. The higher the value, the higher the probability that the current sample is classified as class=1, and vice versa.

$h_{\theta}(x) = \frac{1}{1+e^{-\theta x}}$

As the formula above shows, $\theta$ is the parameter we want to learn or train or optimize and $x$ is the input data. The output is the prediction value when the value is closer to `1`

, which means the instance is more likely to be a positive sample(**y=1**). If the value is closer to `0`

, this means the instance is more likely to be a negative sample(**y=0**).

To optimize our task, we need to define a loss function(cost or objective function) for this task. In `logistic regression`

, we use the `log-likelihood loss`

function.

$J(\theta)= - \frac{1}{m} \sum_{i=1}^{m}(y^{i}*log(p^{i})+(1-y^{i})log(1-p^{i}))$

$m$ is the number of samples in the training data. $y^{i}$ is the label of the i-th sample, $p^{i}$ is the prediction value of the i-th sample. When the current sample’s label is **1**, then the second term of the formula is **0**. We hope the larger the first term, the better, and vice versa. Finally, we add the loss of all samples, take the average, and add a negative sign. Our goal is to minimize the $J(\theta)$. When $J(\theta)$ is smaller, it means that the model fits better on the data set.

There is no closed-form method to find $\theta$. To achieve this goal, we need to use some optimization algorithms, such as gradient descent. Since $J(\theta)$ is a convex function, the gradient descent is guaranteed to find a global minimum.

Let’s start coding.

Get hands-on with 1200+ tech skills courses.