How to implement gradient descent in MATLAB
Gradient descent is a popular optimization algorithm extensively applied in machine learning and numerical optimization. Its operation involves calculating the cost function's gradient concerning the parameters and then updating them by taking steps proportionate to the negative gradient. This process continues until
In this Answer, we will learn the concept of gradient descent and provide a step-by-step guide to implementing it in MATLAB.
1. Defining the cost function
Before implementing gradient descent, the initial step involves defining the cost function that requires minimization. The selection of the cost function is dependent on the particular problem being addressed. For instance, in a basic linear regression scenario, the cost function could be the mean squared error (MSE) calculated between the predicted values and the true target values.
2. Initializing parameters
Next, we need to initialize the parameters of our model with some initial values. For linear regression, we have one weight per input feature, including the bias term. We can initialize the weights randomly or with zeros.
3. Setting hyperparameters
Hyperparameters are parameters that control the behavior of the optimization algorithm. In gradient descent, the key hyperparameter is the learning rate, which determines the step size taken in each iteration. Additionally, specify the number of iterations to perform.
4. Performing gradient descent iterations
Now, we proceed to execute the gradient descent iterations. During each iteration, we calculate the gradient of the cost function concerning the parameters, and subsequently, we update the parameter values by moving in the direction opposite to the gradient. We repeat this process until convergence or reaching the maximum number of iterations.
Let's look into the following MATLAB code that computes gradient descent for linear regression:
% Example data for linear regressionX = [1, 1; 1, 2; 1, 3; 1, 4]; % input features (including the bias term)y = [2; 3; 4; 5]; % target values% Hyperparameterslearning_rate = 0.01;num_iterations = 1000;% Initialize parameterstheta = zeros(size(X, 2), 1); %column vector of weights% Gradient Descentfor iter = 1:num_iterations% Calculate predictionspredictions = X * theta;% Calculate the errorerror = predictions - y;% Calculate the gradientgradient = X' * error;% Update parameterstheta = theta - learning_rate * gradient;end% Display the learned parametersdisp('Learned parameters:');disp(theta);
Code explanation
Lines 2–3: These lines define the input features
Xand target valuesyfor a linear regression problem. The input featuresXare amatrix, where each row represents a training example, and the first column represents the bias term. The target values yare acolumn vector. Lines 6–7: These lines specify the hyperparameters for the gradient descent algorithm. The
learning_ratedetermines the step size taken in each iteration, andnum_iterationsdetermines the maximum number of iterations to perform.Line 10: This line initializes the parameter
thetaof the linear regression model. The size ofthetais determined by the number of columns inX. It corresponds to the number of features (including the bias term). It creates a column vector of zeros.Lines 13–25: This loop performs the gradient descent iterations. It loops for
num_iterationstimes. In each iteration:It calculates the predictions by multiplying the input features
Xwith the current parameter valuestheta.It calculates the error by subtracting the target values
yfrom the predictions.It calculates the gradient by taking the dot product of the transpose of
Xand the error.It updates the parameters
thetaby subtracting the learning rate multiplied by the gradient.
Lines 28–29: These lines display the learned parameters
thetaafter finishing the gradient descent iterations.
Free Resources