Visualization with Distributions
Explore different types of data distribution plots using Seaborn in Python. Understand how to create and interpret histograms, box plots, violin plots, and joint plots to reveal patterns and statistical insights in data. This lesson helps you visually analyze the spread, modality, and relationships within your datasets.
We'll cover the following...
Introduction to distributions #
A probability distribution is a mathematical function that provides the probabilities of the occurrence of different possible outcomes.
For example, you might have a program that returns 1 with a 50% probability and 0 with a 50% probability. Thus, 50% of your probability distribution would be assigned to 1 and 50% to 0.
If you were to plot this expected distribution, you would have two bars of equal height for 1 and 0.
Often, with data you don’t know the mathematical function which generated your data, so instead you observe the empirical distribution. You might sample 10 colored balls from a bag and get 2 red, 3 yellow, and 5 green. That would then be your empirical distribution and you could graphical represent it with 3 bars. One of height 2 for red, one of height 3 for yellow, and one of height 5 for green.
Seaborn has a few ways to plot distributions:
- Histograms
- Box plots
- Violin plots
- Joint plots