How to implement linear regression from scratch

Linear regression serves as a foundational statistical and machine learning method employed to establish a connection between one or more independent variables (features) and a dependent variable (target). This technique operates under the assumption of a linear relationship, signifying that alterations in the dependent variable exhibit proportionality to changes in the independent variables.

Mathematical representation

For a simple linear regression with one independent variable:

import numpy as np
def linear_regression(X, y, learning_rate=0.01, epochs=1000):
    # Initialize weights and bias
    weights = np.random.randn(X.shape[1], 1)
    bias = np.random.randn()
    # Perform gradient descent
    for epoch in range(epochs):
        # Calculate predictions
        y_pred = np.dot(X, weights) + bias
        # Calculate errors
        loss = y_pred - y
        # Calculate gradients
        dw = 2 / len(X) * np.dot(X.T, loss)
        db = 2 / len(X) * np.sum(loss)
        # Update weights and bias
        weights = weights - learning_rate * dw
        bias = bias - learning_rate * db
    return weights, bias
#X_bias = np.c_[np.ones((X.shape[0], 1)), X]
np.random.seed(42)
X_train = 2 * np.random.rand(100, 1)
y_train = 4 + 3 * X_train + np.random.randn(100, 1)
# Add a column of ones to X_train for the bias term
X_train_bias = np.c_[np.ones((100, 1)), X_train]
# Train the model
trained_weights, trained_bias = linear_regression(X_train_bias, y_train)
# Now, let's perform predictions using the trained model
def predict(X, weights, bias):
    """
    Predict output values using the trained linear regression model.
    Parameters:
    - X: Input features (matrix)
    - weights: Trained model coefficients
    - bias: Trained model intercept
    Returns:
    - y_pred: Predicted output values
    """
    return np.dot(X, weights) + bias
# Generate test data for prediction
X_test = np.array([[1.5], [2.5]])
# Add a column of ones to X_test for the bias term
X_test_bias = np.c_[np.ones((len(X_test), 1)), X_test]
# Perform predictions
predictions = predict(X_test_bias, trained_weights, trained_bias)
# Display the predictions
print("Predictions:", predictions)

Code explanation

Line 1: Import the necessary library, numpy.
Line 3: Define the linear_regression function with parameters X (input features), y (output), learning_rate, and epochs.
Lines 5–6: Initialize weights and bias with random values.
Line 9: Perform gradient descent for the specified number of epochs.
Lines 11–18: Calculate predictions, errors, and gradients.
Lines 21–22: Update weights and bias using the learning rate.
Line 27: Set a random seed for reproducibility.
Lines 28–29: Generate synthetic training data (X_train, y_train).
Line 32: Add a column of ones to X_train for the bias term.
Line 35: Train the model using the linear_regression function.
Lines 38–50: Define a function (predict) for making predictions using the trained model.
Line 53: Generate test data (X_test) for prediction.
Line 56: Add a column of ones to X_test for the bias term.
Line 59: Perform predictions using the trained model.
Line 62: Display the predictions.

Here’s a quiz to test your knowledge.

Parameter	Description
X	Input features
y	Actual output values
weights	Coefficients for each feature in the input data
bias	Intercept term in the linear equation
learning_rate	Step size to adjust weights and bias in each iteration
epochs	Number of iterations the model goes through the dataset
cost	Cost or loss function, measures the difference between predicted and actual values
m	Number of samples in the training dataset
n	Number of features in the input data

How to implement linear regression from scratch

Mathematical representation

Linear regression parameters

Implementation in Python

Step 1: Initialize parameters

Step 2: Make predictions

Step 3: Calculate loss

Step 4: Compute gradients

Step 5: Update parameters

Step 6: Repeat

Code example

Code explanation