Plotting the Data
Explore how to plot data by merging mental illness and GDP datasets using Python. Understand the visual relationships between wealth and the prevalence of anxiety, schizophrenia, and pancreatic cancer through scatter plots and regression trends.
After loading the libraries, we’ll need to generate a scatter
plot; we’ll merge our anxiety disorder data frame (anx) with the
GDP data using pd.merge. We would do the same to merge our
schizophrenia data (sch).
Plot prevalence of anxiety disorder
Plotting the graph won’t involve anything we haven’t seen before in these projects. The labels argument is used to make the plot more readable by assigning descriptive axis labels.
And here’s what came out the other end:
While some may find this surprising, the regression line is nearly flat, indicating that there’s minimal variance in prevalence between rich and developing nations. There’s quite a cluster of high-prevalence countries at the right edge of the x-axis along the very bottom (i.e., the least wealthy regions). This plot therefore suggests that anxiety disorders are more prevalent in countries with lower GDP.
Plot prevalence of schizophrenia
Here’s how we plot the prevalence of schizophrenia:
Here we can see another trend that some may find surprising; we’re seeing a significant increase in the prevalence of schizophrenia as we move up the y-axis towards higher GDP rates. The lowest rate among “developed” nations was Denmark, which reported many cases ( 286 per 100,000).
How many populations of the U.S. are affected by schizophrenia?
Plot prevalence pancreatic cancer
To confirm our findings, let’s take a medical disease that should have — at best — minimal cultural connections. Perhaps this condition will show minimal variation between countries. Well, using all the same code, here’s how that turned out for pancreatic cancer:
Another result that some may find surprising. That clear upward slope of the regression line (or trendline) shows us an unmistakable correlation between higher wealth and higher incidence of pancreatic cancer. What’s going on here? We’ll see this in the next lesson.
Write a code to plot a scatter graph with a regression line.
Jupyter notebook in action
To see the above Python scripts in a notebook, click to launch the application.
What is the use of trendline="ols"?