How to implement gradient descent in MATLAB

In this Answer, we will learn the concept of gradient descent and provide a step-by-step guide to implementing it in MATLAB.

1. Defining the cost function

Before implementing gradient descent, the initial step involves defining the cost function that requires minimization. The selection of the cost function is dependent on the particular problem being addressed. For instance, in a basic linear regression scenario, the cost function could be the mean squared error (MSE) calculated between the predicted values and the true target values.

2. Initializing parameters

Next, we need to initialize the parameters of our model with some initial values. For linear regression, we have one weight per input feature, including the bias term. We can initialize the weights randomly or with zeros.

3. Setting hyperparameters

Hyperparameters are parameters that control the behavior of the optimization algorithm. In gradient descent, the key hyperparameter is the learning rate, which determines the step size taken in each iteration. Additionally, specify the number of iterations to perform.

4. Performing gradient descent iterations

Now, we proceed to execute the gradient descent iterations. During each iteration, we calculate the gradient of the cost function concerning the parameters, and subsequently, we update the parameter values by moving in the direction opposite to the gradient. We repeat this process until convergence or reaching the maximum number of iterations.

Let's look into the following MATLAB code that computes gradient descent for linear regression:

% Example data for linear regression
X = [1, 1; 1, 2; 1, 3; 1, 4]; % input features (including the bias term)
y = [2; 3; 4; 5]; % target values
% Hyperparameters
learning_rate = 0.01;
num_iterations = 1000;
% Initialize parameters
theta = zeros(size(X, 2), 1); %column vector of weights
% Gradient Descent
for iter = 1:num_iterations
    % Calculate predictions
    predictions = X * theta;
    
    % Calculate the error
    error = predictions - y;
    
    % Calculate the gradient
    gradient = X' * error;
    
    % Update parameters
    theta = theta - learning_rate * gradient;
end
% Display the learned parameters
disp('Learned parameters:');
disp(theta);

Code explanation

Lines 2–3: These lines define the input features X and target values y for a linear regression problem. The input features X are a $4\times2$ matrix, where each row represents a training example, and the first column represents the bias term. The target values y are a $4\times1$ column vector.
Lines 6–7: These lines specify the hyperparameters for the gradient descent algorithm. The learning_rate determines the step size taken in each iteration, and num_iterations determines the maximum number of iterations to perform.
Line 10: This line initializes the parameter theta of the linear regression model. The size of theta is determined by the number of columns in X. It corresponds to the number of features (including the bias term). It creates a column vector of zeros.
Lines 13–25: This loop performs the gradient descent iterations. It loops for num_iterations times. In each iteration:
- It calculates the predictions by multiplying the input features X with the current parameter values theta.
- It calculates the error by subtracting the target values y from the predictions.
- It calculates the gradient by taking the dot product of the transpose of X and the error.
- It updates the parameters theta by subtracting the learning rate multiplied by the gradient.
Lines 28–29: These lines display the learned parameters theta after finishing the gradient descent iterations.

Free Resources