Implement logistic regression in Python from scratch

Logistic regression is a predictive analysis technique used in statistics and machine learning. It is particularly suited for binary classification problems, where the goal is to determine the probability of an outcome belonging to one of two classes such as pass/fail, win/lose, or healthy/sick, based on one or more independent variables. This method estimates the probability that a specific input corresponds to a particular class.

Below is an illustration that depicts the graphical representation of logistic regression:

Parameter	Description
`X`	Input features
`y`	Actual output values
`weights`	Coefficients assigned to each characteristic within the input dataset
`bias`	Intercept term in the logistic equation
`learning_rate`	Step size to adjust weights and bias in each iteration
`epochs`	Number of iterations the model goes through the dataset
`cost`	It refers to the cost or loss function, which is a method used to quantify the discrepancy between the predicted values and the actual values in a model
`m`	Number of samples in the training dataset
`n`	Number of features in the input data

def train(X_train, y_train, epochs, learning_rate):
    n_features = X_train.shape[1]
    weights = np.zeros(n_features)
    bias = 0
    
    for epoch in range(epochs):
        y_pred = forward_propagation(X_train, weights, bias)
        loss = compute_loss(y_train, y_pred)
        dw, db = backward_propagation(X_train, y_train, y_pred)
        weights, bias = update_parameters(weights, bias, dw, db, learning_rate)
        
        if epoch % 100 == 0:
            print('Epoch', epoch,'Loss:', loss)
    
    return weights, bias

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Step 2: Load the dataset
data = load_breast_cancer()
X, y = data.data, data.target
# Step 3: Preprocess the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Compute the sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))
# Step 5: Forward propagation
def forward_propagation(X, weights, bias):
    z = np.dot(X, weights) + bias
    return sigmoid(z)
# Step 6: Compute loss
def compute_loss(y_true, y_pred):
    epsilon = 1e-5
    return -np.mean(y_true * np.log(y_pred + epsilon) + (1 - y_true) * np.log(1 - y_pred + epsilon))
# Step 7: Backward propagation
def backward_propagation(X, y_true, y_pred):
    dw = np.dot(X.T, (y_pred - y_true)) / len(X)
    db = np.mean(y_pred - y_true)
    return dw, db
# Step 8: Update parameters
def update_parameters(weights, bias, dw, db, learning_rate):
    weights -= learning_rate * dw
    bias -= learning_rate * db
    return weights, bias
# Step 9: Train the model
def train(X_train, y_train, epochs, learning_rate):
    n_features = X_train.shape[1]
    weights = np.zeros(n_features)
    bias = 0
    
    for epoch in range(epochs):
        # Forward propagation
        y_pred = forward_propagation(X_train, weights, bias)
        
        # Compute loss
        loss = compute_loss(y_train, y_pred)
        
        # Backward propagation
        dw, db = backward_propagation(X_train, y_train, y_pred)
        
        # Update parameters
        weights, bias = update_parameters(weights, bias, dw, db, learning_rate)
        
        if epoch % 100 == 0:
            print('Epoch', epoch,'Loss:', loss)
    
    return weights, bias
# Step 10: Make predictions
def predict(X, weights, bias):
    y_pred = forward_propagation(X, weights, bias)
    return np.round(y_pred)
# Train the model
epochs = 1000
learning_rate = 0.01
weights, bias = train(X_train, y_train, epochs, learning_rate)
# Make predictions
y_pred_train = predict(X_train, weights, bias)
y_pred_test = predict(X_test, weights, bias)
# Evaluate the model
train_accuracy = np.mean(y_pred_train == y_train)
test_accuracy = np.mean(y_pred_test == y_test)
train_accuracy_percent = train_accuracy * 100
test_accuracy_percent = test_accuracy * 100
print('Training Accuracy {:.2f} percent'.format(train_accuracy_percent))
print('Test Accuracy {:.2f} percent'.format(test_accuracy_percent))

Explanation

Lines 1–3: We import all the necessary libraries.
Lines 6–7: We load the breast cancer dataset using sklearn.
Lines 10: We split the dataset into training and testing sets.
Lines 13–14: We define a sigmoid() function for use in logistic regression.
Lines 17–19: We calculate the predicted probabilities using the sigmoid() function.
Lines 22–24: We compute the cross-entropy loss function to measure the difference between predicted and actual labels.
Lines 27–30: We compute the gradients of the loss with respect to weights and bias.
Lines 33–36: We update the weights and bias variables using the computed gradients and a learning rate.
Lines 39–60: We iteratively train the model for a predefined number of epochs.
Lines 63–65: We make predictions using the trained model.
Lines 68–80: We evaluate the accuracy of the model on both training and testing datasets.