In machine learning, Naive Bayes classifiers are widely used for classification because, when the assumption of independence holds, they are easy to implement and yield better results than other sophisticated predictors.
Naive Bayes classifiers are based on Bayes’ theorem and assume that the occurrence or absence of a feature does not influence the presence or absence of some other feature.
Gaussian Naive Bayes classifier: used when features are not discreet.
Multinomial Naive Bayes Classifier: used when features follow a multinomial distribution.
Bernoulli Naive Bayes classifier: used when features are of the boolean type.
Let’s take a look at the Mathematics behind Naive Bayes classifiers.
The equation for Bayes theorem is:
$P(class|X) = P(X|class)P(class)/P(X)$
A class variable is something that the classifier is trying to classify. For instance, when trying to classify an email as spam or not, “is spam” is the class variable.
In the equation above, $class$ is the class variable and $X$ is the set of features. $X = (x_1, x_2, ... x_n)$
The above formula can be rewritten as:
$P(class|x_1 ... x_n) = P(x_1|class)... P(x_n|class)P(class)/P(x_1)...P(x_n)$
Notice that for all entries in the given dataset, the denominator will not change. Hence, the denominator can be ignored.
$P(class|x_1 ... x_n) \propto P(x_1|class)... P(x_n|class)P(class)$
For all outcomes of the class variable, the class variable with the maximum probability needs to be found using:
$class = argmax(P(x_1|class)... P(x_n|class)P(class))$
Note: Different Naive Bayes classifiers make different assumptions regarding the distribution of $P(x_i | class)$.
Some applications that use Naive Bayes classifiers are:
Spam Filtering
Text Analysis
Recommendation Systems
RELATED TAGS
View all Courses