Trusted answers to developer questions
Trusted Answers to Developer Questions

Related Tags

naive bayes
machine learning
communitycreator

# Why is "Naive" Bayes naive?

Grokking Modern System Design Interview for Engineers & Managers

Ace your System Design Interview and take your career to the next level. Learn to handle the design of applications like Netflix, Quora, Facebook, Uber, and many more in a 45-min interview. Learn the RESHADED framework for architecting web-scale applications by determining requirements, constraints, and assumptions before diving into a step-by-step design process.

In Machine Learning, Naive Bayes classifiers are widely used for classification because, when the assumption of independence holds, they are easy to implement and yield better results than other sophisticated predictors.

Naive Bayes classifiers are based on the Bayes’ theorem and assume that the occurrence or absence of a feature does not influence the presence or absence of some other feature. ## Types

Gaussian Naive Bayes classifier: used when features are not discreet.

Multinomial Naive Bayes Classifier: used when features follow a multinomial distribution.

Bernoulli Naive Bayes classifier: used when features are of the boolean type.

## Mathematical derivation

Let’s take a look at the Mathematics behind Naive Bayes classifiers.

The equation for Bayes theorem is:

$P(class|X) = \dfrac{P(X|class)P(class)}{P(X)}$

A class variable is something that the classifier is trying to classify. For instance, when trying to classify an email as spam or not, the class variable “is spam” is used.

In the equation above, class is the class variable and X is the set of features:

X = (x_1, x_2, ... x_n)

We can rewrite the above formula as:

P(class|x_1...x_n) = P(x_1|class)...P(x_n|class)P(class)/P(x_1)...P(x_n)


Notice that for all entries in the given dataset, the denominator will not change. Hence, we can ignore the denominator:

P(class|x_1 ... x_n) ∝ P(x_1|class)...P(x_n|class)P(class)


For all outcomes of the class variable, the class variable with the maximum probability needs to be found in the following way:

class = argmax(P(x_1|class)...P(x_n|class)P(class))


## Why Naive?

The term “naive” means that Naive Bayes assumes that each feature of any Naive Bayes measurement is independent of all the other features. This way, during training, the model does not rely on any existing duplicates to help with the classification. We separately take any feature and take the probability of previous measurements belonging to class A that contina the same value for this specific feature. Repeat this with all other features and take a complete product of individual probabilities. This naive assumption by the Naive Bayes classifier indeed helps users with their classification models.

RELATED TAGS

naive bayes
machine learning
communitycreator

CONTRIBUTOR 