# Naive Bayes

Learn about naive Bayes and building a discriminative model.

## Representing the problem

In the previous example, we used 2-dimensional feature vectors to illustrate the classification problems with 2-dimensional plots. However, most machine learning applications work with high-dimensional feature vectors. We will now discuss an important method with generative models, that is often used with high-dimensional data, known as naive Bayes. We will discuss this method with an example of text processing, following an example from Andrew Ng of making a spam filter that classifies email messages as either spam $(y = 1)$ or non-spam $(y = 0)$ emails. To do this, we first need a method to represent the problem in a suitable way. We choose here to represent a text (an email in this situation) as a **vocabulary** vector.

Note:A vocabulary vector is simply a list of all possible words that we’ll consider.

A text can be represented by a vector with entry of $1$ if the word can be found in the text or an entry of $0$ if not. This is shown as follows:

Get hands-on with 1200+ tech skills courses.