Implement logistic regression in Python from scratch
Logistic regression is a predictive analysis technique used in statistics and machine learning. It is particularly suited for binary classification problems, where the goal is to determine the probability of an outcome belonging to one of two classes such as pass/fail, win/lose, or healthy/sick, based on one or more independent variables. This method estimates the probability that a specific input corresponds to a particular class.
Below is an illustration that depicts the graphical representation of logistic regression:
Mathematical representation
The logistic regression model employs the logistic function (also referred to as the sigmoid function) to estimate the likelihood that a given input belongs to the positive category (often represented as class
Where
denotes the likelihood that the outcome ( ) will be , considering the input ( ). represents the natural logarithm’s base is defined as the weighted sum of input variables, inclusive of a bias component:
Logistic regression parameters
Let’s look at the parameters required for the logistic regression function.
Parameter | Description |
| Input features |
| Actual output values |
| Coefficients assigned to each characteristic within the input dataset |
| Intercept term in the logistic equation |
| Step size to adjust weights and bias in each iteration |
| Number of iterations the model goes through the dataset |
| It refers to the cost or loss function, which is a method used to quantify the discrepancy between the predicted values and the actual values in a model |
| Number of samples in the training dataset |
| Number of features in the input data |
Implementation in Python
Let’s break down the steps to implement logistic regression from scratch:
Step 1: Initialize parameters
Import necessary libraries and load the breast cancer dataset.
import numpy as npfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_splitdata = load_breast_cancer()X, y = data.data, data.target
Step 2: Preprocessing the data
Split the dataset into features (X) and a target variable (y).
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 3: Define the sigmoid function
Define the sigmoid() function, which is used to convert the linear output to a probability between 0 and 1.
def sigmoid(z):return 1 / (1 + np.exp(-z))
Step 4: Forward propagation
Calculate the predicted probabilities using the sigmoid() function.
def forward_propagation(X, weights, bias):z = np.dot(X, weights) + biasreturn sigmoid(z)
Step 5: Define the loss function
Calculate the loss using the cross-entropy loss function.
def compute_loss(y_true, y_pred):epsilon = 1e-5return -np.mean(y_true * np.log(y_pred + epsilon) + (1 - y_true) * np.log(1 - y_pred + epsilon))
Step 6: Backward propagation
Compute the gradients of the loss function with respect to weights and bias.
def backward_propagation(X, y_true, y_pred):dw = np.dot(X.T, (y_pred - y_true)) / len(X)db = np.mean(y_pred - y_true)return dw, db
Step 7: Update parameters
Update weights and bias using the gradients and learning rate.
def update_parameters(weights, bias, dw, db, learning_rate):weights -= learning_rate * dwbias -= learning_rate * dbreturn weights, bias
Step 8: Train the model
Iterate through the dataset to update parameters and minimize the loss.
def train(X_train, y_train, epochs, learning_rate):n_features = X_train.shape[1]weights = np.zeros(n_features)bias = 0for epoch in range(epochs):y_pred = forward_propagation(X_train, weights, bias)loss = compute_loss(y_train, y_pred)dw, db = backward_propagation(X_train, y_train, y_pred)weights, bias = update_parameters(weights, bias, dw, db, learning_rate)if epoch % 100 == 0:print('Epoch', epoch,'Loss:', loss)return weights, bias
Step 9: Make predictions
Use the trained model to make predictions on new data.
def predict(X, weights, bias):y_pred = forward_propagation(X, weights, bias)return np.round(y_pred)
Step 10: Evaluate the model
Use the trained model to make predictions on new data and evaluate its performance.
epochs = 1000learning_rate = 0.01weights, bias = train(X_train, y_train, epochs, learning_rate)y_pred_train = predict(X_train, weights, bias)y_pred_test = predict(X_test, weights, bias)train_accuracy = np.mean(y_pred_train == y_train)test_accuracy = np.mean(y_pred_test == y_test)print('Training Accuracy',train_accuracy)print('Test Accuracy',test_accuracy)
Example
Here’s an example of how to implement logistic regression in Python from scratch:
import numpy as npfrom sklearn.datasets import load_breast_cancerfrom sklearn.model_selection import train_test_split# Step 2: Load the datasetdata = load_breast_cancer()X, y = data.data, data.target# Step 3: Preprocess the dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# Step 4: Compute the sigmoid functiondef sigmoid(z):return 1 / (1 + np.exp(-z))# Step 5: Forward propagationdef forward_propagation(X, weights, bias):z = np.dot(X, weights) + biasreturn sigmoid(z)# Step 6: Compute lossdef compute_loss(y_true, y_pred):epsilon = 1e-5return -np.mean(y_true * np.log(y_pred + epsilon) + (1 - y_true) * np.log(1 - y_pred + epsilon))# Step 7: Backward propagationdef backward_propagation(X, y_true, y_pred):dw = np.dot(X.T, (y_pred - y_true)) / len(X)db = np.mean(y_pred - y_true)return dw, db# Step 8: Update parametersdef update_parameters(weights, bias, dw, db, learning_rate):weights -= learning_rate * dwbias -= learning_rate * dbreturn weights, bias# Step 9: Train the modeldef train(X_train, y_train, epochs, learning_rate):n_features = X_train.shape[1]weights = np.zeros(n_features)bias = 0for epoch in range(epochs):# Forward propagationy_pred = forward_propagation(X_train, weights, bias)# Compute lossloss = compute_loss(y_train, y_pred)# Backward propagationdw, db = backward_propagation(X_train, y_train, y_pred)# Update parametersweights, bias = update_parameters(weights, bias, dw, db, learning_rate)if epoch % 100 == 0:print('Epoch', epoch,'Loss:', loss)return weights, bias# Step 10: Make predictionsdef predict(X, weights, bias):y_pred = forward_propagation(X, weights, bias)return np.round(y_pred)# Train the modelepochs = 1000learning_rate = 0.01weights, bias = train(X_train, y_train, epochs, learning_rate)# Make predictionsy_pred_train = predict(X_train, weights, bias)y_pred_test = predict(X_test, weights, bias)# Evaluate the modeltrain_accuracy = np.mean(y_pred_train == y_train)test_accuracy = np.mean(y_pred_test == y_test)train_accuracy_percent = train_accuracy * 100test_accuracy_percent = test_accuracy * 100print('Training Accuracy {:.2f} percent'.format(train_accuracy_percent))print('Test Accuracy {:.2f} percent'.format(test_accuracy_percent))
Explanation
Lines 1–3: We import all the necessary libraries.
Lines 6–7: We load the breast cancer dataset using
sklearn.Lines 10: We split the dataset into
trainingandtestingsets.Lines 13–14: We define a
sigmoid()function for use in logistic regression.Lines 17–19: We calculate the predicted probabilities using the
sigmoid()function.Lines 22–24: We compute the cross-entropy loss function to measure the difference between
predictedandactuallabels.Lines 27–30: We compute the gradients of the loss with respect to
weightsandbias.Lines 33–36: We update the
weightsandbiasvariables using the computed gradients and a learning rate.Lines 39–60: We iteratively train the model for a predefined number of
epochs.Lines 63–65: We make predictions using the trained model.
Lines 68–80: We evaluate the accuracy of the model on both
trainingandtestingdatasets.
Here’s a quiz to test your knowledge.
What is the purpose of the sigmoid function in logistic regression?
To compute the mean squared error
To regularize the weights
To speed up convergence in gradient descent
To transform linear predictions into probabilities
Free Resources