Handling Overplotting and Outlier Values
Explore methods for handling overplotting and outlier values in scatterplots using Plotly and Dash. Learn to adjust marker opacity, size, symbols, and apply logarithmic scales to improve chart readability and data interpretation.
We'll cover the following...
We'll cover the following...
Let’s say we are now interested in seeing the relationship between our variable and population for the same year that we have been working on. We want to have Population, total on the axis and perc_pov_19 on the axis.
We first create a subset of poverty in which year is equal to 2010 and is_country is True, and sort the values using Population, total:
df =\
poverty[poverty['year'].eq(2010) & poverty['is_country']]
.sort_values('Population, total')
Now let’s see how to plot those two variables. Here is the code:
px.scatter(df,
y=perc_pov_19,
x='Population, total',
title=' - '.join([perc_pov_19, '2010']),
height=500)
Running this produces the chart in the Jupyter Notebook set up below:
Please login to launch live app!
- The existence of one outlier, China, with a population close to 1.4 billion, forces all markers to be squeezed into a very narrow part of our chart.
- We also have a small cluster of