Learning in Probabilistic Models: Maximum Likelihood Estimate

Learn about the maximum likelihood estimate and its generalization.

Maximum likelihood estimate (MLE)

We will now turn to the important principle that will further guide our learning process. Learning means, of course, determining the parameters of the model from example data. Here we introduce the maximum likelihood principle. This states that we choose the parameters in a probabilistic model in the following way:

Given a parameterized hypothesis function p(y,xw)p(y, \mathbf{x} \mid \mathbf{w}), we will choose as parameters the values which make the training data {y,x}\{y, \mathbf{x}\} most likely under the assumption of the model.

The MLE\text{MLE} principle is stated here again in its most general form for all random data. In our case, we start with a model of the form p(yx;w) p(y|\mathbf{x};\mathbf{w}) , which specifies a probabilistic regression model for given input data. This is why the input data appears on the right side of the horizontal bar. However, we will see shortly that in MLE\text{MLE} we replace all the data at some point with the training data so that we end up with a function (the likelihood function) that is a function of the parameters. Hence, in this case, it doesn’t matter if we treat the input data as given or as random variables.

Applying the maximum likelihood estimate

Let’s illustrate this on the Gaussian example of a robot that we discussed in the previous lesson with the parameterized model (equationeq75). We are considering the 1-dimensional case with one feature value xx. Given the parameters w0w_0, w1w_1, and assuming that σ=1\sigma = 1 to simplify the discussions, and the feature value for the first data point x(1)x^{(1)}, then the prediction of the probability of the corresponding label is as follows:

Get hands-on with 1200+ tech skills courses.