Probability Distributions (Binomial and Bernoulli Distributions)
Knowledge of Probability Distributions is essential in the Data Science field and it is the backbone for understanding many concepts. You'll learn about it in this lesson.
What are probability distributions ?
Probability distribution is a summary of probabilities associated with all the possible outcomes of a random variable X. Probability Distributions have a particular shape that has properties like Mean(Expected Value), Variance, Skewness and Kurtosis.
Example
When we roll a fair dice, the probability of each outcome is equally likely, meaning $\frac{1}{6}$ . We can write it as seen below.
Outcome of Rolling a Dice  Probability 

1  1/6 
2  1/6 
3  1/6 
4  1/6 
5  1/6 
6  1/6 
Probability Histogram
If we take the dice roll outcome on the xaxis and its probability on the yaxis, we get the below graph representation.
Types of random variables
Discrete random variables
Discrete random variables are the variables whose outcomes take on a discrete set of values. The function that calculates the probability of each outcome (Probability Distribution) of a Discrete Random Variable is called the Probability Mass Function.
Example
Let X be the random variable denoting the sum of two dice. When two dice are rolled, the possible outcomes on the face of the dices are (1,1), (1,2), (1,3), (1,4) , (1,5), (1,6), (2,1), (2,2) and so on. So the total outcomes are 36, among which the possible sum on two dices are listed below along with their probability.
X (the sum of two dice)  Probability 

2  1/36 
3  2/36 
4  3/36 
5  4/36 
6  5/36 
7  6/36 
8  5/36 
9  4/36 
10  3/36 
11  2/36 
12  1/36 
Probability Histogram
If we take the sum of the two dice on the xaxis and their probability on the yaxis, we get the below graph representation.
Continuous Random Variables
The variable whose outcomes can take on a real value. The Probability Density Function (PDF) defines the probability distribution of a Continuous Random Variable. Notice that for Discrete Random Variable it was called Probability Mass Function.
A continuous random variable has a probability of zero of assuming exactly any of its values. Consequently, its probability distribution cannot be given in tabular form.
The probability distribution of a Continuous Random Variable is shown by a density curve. The probability that X is between an interval of numbers is the area under the density curve between the interval endpoints. One of the most commonly used Continuous Probability Distribution is Gaussian or Normal Distribution.
Bernoulli Distribution
Bernoulli Distribution is a Discrete Probability Distribution. Bernoulli Distribution consists of only two outcomes: 1 (success) and 0 (failure). The probability of success is denoted by “p”, and the probability of failure is denoted by “q” or “1p”.
The Probability Mass Function which calculates the probability of each outcome is given as
$P(x)$ = $p^x$ $q^{1x}$ = $p^x$ $(1p)^{1x}$ where $x \in \{0,1\}$
The above Probability Mass Function can also be written as
$P(x)=$ $\Bigg\{$ $\begin{matrix} q=1p & x=0\\ \\ p & x=1 \end{matrix}$
 Calculate the probability of failure using $(1p)$, if the outcome is 0.
 Calculate probability of success using $p$, if the outcome is 1.
 The probability of each outcome does not need to be equally likely.
Example

Weather forecasts say that the probability of rain today is 0.3, so p = 0.3.

Using this we can calculate the probability that it will not rain today, so q = 1  p = 1  0.3 = 0.7.

Presenting the probability histogram we have:
Binomial Distribution
Binomial Distribution is a Discrete Probability Distribution. In Bernoulli Distribution, we analyzed one trial of the experiment whose outcome can be a success or failure.
In Binomial Distribution we have $n$ trials where the outcome of each trial is either a success or a failure. The probability of success for one trial is denoted by “p” and the probability of failure for one trial is denoted by “1p” or “q”. Bernoulli Distribution is a Binomial Distribution with n=1 i.e the number of trials in Bernoulli Distribution is 1.
In Binomial Distribution, the number of trials is fixed. Each experimental trial is independent. The probability of success or “p” remains the same in each trial.
The mathematical expression for binomial distribution is
$P(x) = \frac{n!}{(nx)!x!} p^x q^{nx}$
 n is the total number of trials.
 p is the probability of success.
 q is the probability of failure.
 x is the total number of successes.
We can derive the expression for Bernoulli Distribution, as seen below.
n = 1 (as we have one trial in Bernoulli Distribution)
For x=0
$P(x) = \frac{n!}{(nx)!x!} p^x q^{nx}= \frac{1!}{(10)!0!} p^0 q^{10}=q$
For x=1
$P(x) = \frac{n!}{(nx)!x!} p^x q^{nx}= \frac{1!}{(11)!1!} p^1 q^{11}=p$
 4! = 4 × 3 × 2 × 1 = 24
 0! = 1
 ${N\choose k}=\frac{n!}{(nk)!k!}$
Example
A coin is tossed ten times. What is the probability of getting exactly six heads?

Here n = 10

As it is a fair coin so the probability of head and tail in each toss is equally likely, meaning probability of head = 0.5. So, we deduce that p = 0.5 (probability of success) and q=0.5 (probability of failure).

Here x = 6. Note that x is the total number of successes.

Applying the above formula we have
$P(x) = \frac{n!}{(nx)!x!} p^x q^{nx}= \frac{10!}{(106)!6!} (0.5)^6 (0.5)^{106}=210\times0.015625\times0.0625=0.2050$
 The probability of getting exactly six heads is 0.2050.
This is a useful site for graphical illustration of Binomial Distribution.
Get handson with 1200+ tech skills courses.