Minibatch Gradient Descent

Learn how to use minibatch gradient descent to solve the intractable gradient descent optimization.

We'll cover the following...

Stochastic gradient descent (SGD)

Recall that to compute the gradient θJ(θ)\nabla_\theta J(\theta) of an objective J(θ)J(\theta), we need to aggregate the gradients θL(fθ(xi),yi)\nabla_\theta \mathcal{L}(f_\theta(x_i), y_i) ...