Search⌘ K
AI Features

Optimization with Gradient Descent

Explore how to implement gradient descent in Python to optimize a simple predictive model for restaurant tips. Understand the gradient calculation, stopping conditions, learning rate, and types of gradient descent including batch, stochastic, and mini-batch methods. This lesson helps you apply gradient descent for effective parameter tuning and error reduction in predictive analysis.

In the previous lesson, we looked at the intuition behind the gradient descent algorithm and the update equation. In this lesson, we are going to implement it in Python. We are going to predict the tips paid by a customer at a restaurant. We will choose the best model using gradient descent.

Minimization with Gradient Descent

Recall that the gradient descent algorithm is

  • Start with a random initial value of θ\theta.
  • Compute θtαθL(θt,Y)\theta_t - \alpha \frac{\partial}{\partial \theta} L(\theta_t,Y) to update the value of θ\theta.
  • Keep updating the value of θ\theta until it stops changing values. This can be the point where we have reached the minimum of the error function.

We will be using the tips dataset that has the following data.

Python 3.5
# Tips Dataset
# total_bill : Total bill of the customer
# tip : Total tip paid by the customer
# gender : Tgender of the customer(Male/Female)
# smoker : smoker(yes/no)
# day : which day of the week
# time: time of visit (lunch/dinner)
# people: total people that came to dine in.

Our simple model said that for predicting tips we only need the amount of the total bill paid by the customer. Therefore, our prediction (y^\hat{y}) depends on the total bill (xx) and the model parameter (θ\theta). We have:

y^=θx\hat{y} = \theta x

We will need a function that gives us the derivative of the loss function.

gradient

Before we can implement this in code on our predicting tips example, we need to evaluate the gradient term in the update expression

θt+1=θtαθL(θt,Y)\theta_{t+1} = \theta_t - \alpha \frac{\partial}{\partial \theta} L(\theta_t,Y) ...