Probability theory is a branch of Mathematics which deals with describing the chances of an occurrence of event.


Probability deals with how likely an event is to occur. It has a lot of components that are used in the Data Science field. It is mostly used to describe the uncertainty in our predictions and other results. It also helps a lot in the Machine Learning field. Probability is usually described by a number, between 0 and 1. The close the number is to 1, the more likely an event is to occur.


The event is the outcome to which the probability is assigned.

Sample Space

It is the set of possible outcomes or events.

  • An event with probability one is called a certain event.
  • An event with probability zero is called an impossible event.
  • The sum of all the probabilities of an experiment is always equal to 1.

We can calculate probability by counting all of the occurrences of the event and dividing it by the total possible occurrences of the event.

Probability = occurrencesnonoccurrences+occurrences\frac{occurrences}{non-occurrences + occurrences}

An example

A coin is tossed twice. What is the probability that at least 1 head occurs?


We can deduce the following things from the above statement.

H = Head outcome on coin toss
T = Tail outcome on coin toss

After flipping the coin twice, we can have the following possible outcomes, which is our sample space.

Sample Space = {HH, HT, TH, TT}

Event = At least 1 Head(H) occurs.

We can see that there are threethree events in which at least one head(H) occurs. So applying the formula above we have.

Probability = 3/43/4 = 0.750.75

We can also calculate the inverse of the above probability, i.e., events in which no head(H) occurs.

Inverse probability = 11 - 0.750.75 = 0.250.25

  • 0.75 is the probability that at least one head appears.
  • 0.25 is the probability that no head appears.

Schools of probability

There are two main ways of thinking about probability.

Frequentist or Classical Probability

The frequentist approach to probability is objective. Events are observed and counted, and their frequencies provide the basis for directly calculating a probability, hence the name frequentist. We explored an example of classical probability above in the coin flipping example.

Bayesian Probability

The Bayesian approach to probability is subjective. Probabilities are assigned to events based on evidence and personal belief and are centered around Bayes’ theorem, hence the name Bayesian. This allows us to assign probabilities to very infrequent events and events that have not been observed before, unlike frequentist probability.

Both of the above definitions have been taken from the famous books of Jason Brownlee on probability.

Get hands-on with 1200+ tech skills courses.