Two-Dimensional Histograms
Investigate using two-dimensional histograms to undertake bivariate analysis involving two continuous variables.
We'll cover the following...
Two-dimensional (2D) histograms
A 2D histogram represents the distribution of data points in a bivariate context. It is created by dividing the 2D space into a grid of bins and counting the number of data points in each bin. This can be extremely useful as the size of the data increases because scatter points can overlap and become too clumped together. By getting a general sense of where our points lie, we can get a nicely simplified representation of our data.
Advantages | Disadvantages |
The plot allows you to understand the distribution of the data and to identify areas of high and low density in a bivariate context | Can be more difficult to interpret than other visualization methods, such as scatter plots or heatmaps |
Useful for visualizing large datasets, as the plot provides a compact representation of various regions in which the data lies in | Dependence on bin size and shape which can lead to inaccurate conclusions if not carefully considered |
Can help you detect univariate and bivariate outliers | -- |
We will again plot life expectancy against infant mortality to see where most values are concentrated.
2D histogram: Plotly Express
For this, we use px.density_heatmap
, passing in a data_frame
...