Logistic regression is a predictive analysis technique used in statistics and machine learning. It is particularly suited for binary classification problems, where the goal is to determine the probability of an outcome belonging to one of two classes such as pass/fail, win/lose, or healthy/sick, based on one or more independent variables. This method estimates the probability that a specific input corresponds to a particular class.
Below is an illustration that depicts the graphical representation of logistic regression:
The logistic regression model employs the logistic function (also referred to as the sigmoid function) to estimate the likelihood that a given input belongs to the positive category (often represented as class
Where
Let’s look at the parameters required for the logistic regression function.
Parameter | Description |
| Input features |
| Actual output values |
| Coefficients assigned to each characteristic within the input dataset |
| Intercept term in the logistic equation |
| Step size to adjust weights and bias in each iteration |
| Number of iterations the model goes through the dataset |
| It refers to the cost or loss function, which is a method used to quantify the discrepancy between the predicted values and the actual values in a model |
| Number of samples in the training dataset |
| Number of features in the input data |
Let’s break down the steps to implement logistic regression from scratch:
Import necessary libraries and load the breast cancer dataset.
import numpy as npfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitdata = load_breast_cancer()X, y = data.data, data.target
Split the dataset into features (X
) and a target variable (y
).
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Define the sigmoid()
function, which is used to convert the linear output to a probability between 0
and 1
.
def sigmoid(z):return 1 / (1 + np.exp(-z))
Calculate the predicted probabilities using the sigmoid()
function.
def forward_propagation(X, weights, bias):z = np.dot(X, weights) + biasreturn sigmoid(z)
Calculate the loss using the cross-entropy loss function.
def compute_loss(y_true, y_pred):epsilon = 1e-5return -np.mean(y_true * np.log(y_pred + epsilon) + (1 - y_true) * np.log(1 - y_pred + epsilon))
Compute the gradients of the loss function with respect to weights
and bias
.
def backward_propagation(X, y_true, y_pred):dw = np.dot(X.T, (y_pred - y_true)) / len(X)db = np.mean(y_pred - y_true)return dw, db
Update weights
and bias
using the gradients and learning rate
.
def update_parameters(weights, bias, dw, db, learning_rate):weights -= learning_rate * dwbias -= learning_rate * dbreturn weights, bias
Iterate through the dataset to update parameters and minimize the loss.
def train(X_train, y_train, epochs, learning_rate):n_features = X_train.shape[1]weights = np.zeros(n_features)bias = 0for epoch in range(epochs):y_pred = forward_propagation(X_train, weights, bias)loss = compute_loss(y_train, y_pred)dw, db = backward_propagation(X_train, y_train, y_pred)weights, bias = update_parameters(weights, bias, dw, db, learning_rate)if epoch % 100 == 0:print('Epoch', epoch,'Loss:', loss)return weights, bias
Use the trained model to make predictions on new data.
def predict(X, weights, bias):y_pred = forward_propagation(X, weights, bias)return np.round(y_pred)
Use the trained model to make predictions on new data and evaluate its performance.
epochs = 1000learning_rate = 0.01weights, bias = train(X_train, y_train, epochs, learning_rate)y_pred_train = predict(X_train, weights, bias)y_pred_test = predict(X_test, weights, bias)train_accuracy = np.mean(y_pred_train == y_train)test_accuracy = np.mean(y_pred_test == y_test)print('Training Accuracy',train_accuracy)print('Test Accuracy',test_accuracy)
Here’s an example of how to implement logistic regression in Python from scratch:
import numpy as npfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_split# Step 2: Load the datasetdata = load_breast_cancer()X, y = data.data, data.target# Step 3: Preprocess the dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Step 4: Compute the sigmoid functiondef sigmoid(z):return 1 / (1 + np.exp(-z))# Step 5: Forward propagationdef forward_propagation(X, weights, bias):z = np.dot(X, weights) + biasreturn sigmoid(z)# Step 6: Compute lossdef compute_loss(y_true, y_pred):epsilon = 1e-5return -np.mean(y_true * np.log(y_pred + epsilon) + (1 - y_true) * np.log(1 - y_pred + epsilon))# Step 7: Backward propagationdef backward_propagation(X, y_true, y_pred):dw = np.dot(X.T, (y_pred - y_true)) / len(X)db = np.mean(y_pred - y_true)return dw, db# Step 8: Update parametersdef update_parameters(weights, bias, dw, db, learning_rate):weights -= learning_rate * dwbias -= learning_rate * dbreturn weights, bias# Step 9: Train the modeldef train(X_train, y_train, epochs, learning_rate):n_features = X_train.shape[1]weights = np.zeros(n_features)bias = 0for epoch in range(epochs):# Forward propagationy_pred = forward_propagation(X_train, weights, bias)# Compute lossloss = compute_loss(y_train, y_pred)# Backward propagationdw, db = backward_propagation(X_train, y_train, y_pred)# Update parametersweights, bias = update_parameters(weights, bias, dw, db, learning_rate)if epoch % 100 == 0:print('Epoch', epoch,'Loss:', loss)return weights, bias# Step 10: Make predictionsdef predict(X, weights, bias):y_pred = forward_propagation(X, weights, bias)return np.round(y_pred)# Train the modelepochs = 1000learning_rate = 0.01weights, bias = train(X_train, y_train, epochs, learning_rate)# Make predictionsy_pred_train = predict(X_train, weights, bias)y_pred_test = predict(X_test, weights, bias)# Evaluate the modeltrain_accuracy = np.mean(y_pred_train == y_train)test_accuracy = np.mean(y_pred_test == y_test)train_accuracy_percent = train_accuracy * 100test_accuracy_percent = test_accuracy * 100print('Training Accuracy {:.2f} percent'.format(train_accuracy_percent))print('Test Accuracy {:.2f} percent'.format(test_accuracy_percent))
Lines 1–3: We import all the necessary libraries.
Lines 6–7: We load the breast cancer dataset using sklearn
.
Lines 10: We split the dataset into training
and testing
sets.
Lines 13–14: We define a sigmoid()
function for use in logistic regression.
Lines 17–19: We calculate the predicted probabilities using the sigmoid()
function.
Lines 22–24: We compute the cross-entropy loss function to measure the difference between predicted
and actual
labels.
Lines 27–30: We compute the gradients of the loss with respect to weights
and bias
.
Lines 33–36: We update the weights
and bias
variables using the computed gradients and a learning rate.
Lines 39–60: We iteratively train the model for a predefined number of epochs
.
Lines 63–65: We make predictions using the trained model.
Lines 68–80: We evaluate the accuracy of the model on both training
and testing
datasets.
Here’s a quiz to test your knowledge.
What is the purpose of the sigmoid function in logistic regression?
To compute the mean squared error
To regularize the weights
To speed up convergence in gradient descent
To transform linear predictions into probabilities