Motivation for Statistical Analysis

Let’s see an example of why it's important to learn statistical analysis.

The Space Shuttle Challenger

The Space Shuttle Challenger (shown in the figure below) was one the most advanced spacecraft ever built. However, its first version lacked ejection seats for its crew. This was particularly relevant in January 1986, when the expected temperature at launch was below freezing (around 30°F30\degree F). This was much colder than on any other previous mission, raising safety concerns.

The shuttle (Challenger was one of only five put into service) was designed to be largely reusable. The orbiter—the main shuttle craft—was propelled out of the atmosphere by its own engines (supplied by a large external liquid-fuel tank that could be jettisoned when empty) and with the help of two booster rockets that disengaged when spent and fell back, to be recovered from the sea for reuse.

The booster rockets were constructed in cylindrical sections, and the joints were sealed with huge circular washers called O-rings. These O-rings were of particular concern because their ability to prevent fuel leaks (also known as ‘blow by’ in NASA jargon) depended on their flexibility and plasticity, which decreased as temperatures fell.

O-rings data

Because the boosters were recovered and refurbished for reuse, it was possible to check if there had been any fuel leak damage during each launch based on whether or not burn marks were present. The resulting data (called orings) is available as part of the faraway R packageThink of packages as add-on apps that you can download to extend the ‘base’ version of R. The snippet of R code below loads the faraway package and displays the orings data. The head() function shows only the first several rows in order to save space:

Press + to interact
library(faraway)
head(orings)

Challenger’s 1986 mission

The figure below shows how a graph of the number of leaks as a function of launch temperature (from launches prior to Challenger’s 1986 mission) looks.

Let’s look at the code below used to generate the plot above. The library() function loads the ggplot2 package so that its quick-plot function qplot can use the O-ring data to draw a scatterplot. Here we have temp (temperature) on the x-axis and damage on the y-axis.

library(ggplot2)
Fig <- qplot(data = orings, x = temp, y = damage)
Fig

Do you think the number of fuel leaks is related to temperature?

A teleconference was held between NASA and the booster rocket manufacturer on the eve of the launch. After prolonged discussion, the decision was made to proceed. Tragically, shortly after lift-off, a fuel leak on one of the booster rockets ignited the fuel in the external tank, causing an explosion that destroyed Challenger and killed all seven crew members.

As the well-known physicist and author Richard Feynman famously demonstrated in (Tufte 2005)Tufte, Edward R., et al. “Visual explanations.” Graphics Press (2005). at the subsequent inquiry using only a cup of ice water, it was the unusually low temperature that caused the O-rings to become brittle, allowing fuel to leak and consequently causing the explosion.

One revelation was that the discussion between NASA and the engineering company did not involve any statistical analysis of the relationship between fuel leaks and temperature on previous launches—they did not even look at simple graphs of the data like those shown above. You might think that to analyse the available data, to at least draw graphs, is not exactly rocket science—and even if it was, these were rocket scientists!

Moral

The moral of this example is that we should never underestimate the importance of statistical analysis, including informal graphical analysis. Statistics are widely used in medicine, health and safety, and other fields. Even when the outcomes are less critical, graphical and statistical analyses are a key part of the research.

Unfortunately, we appear to be in the middle of a reproducibility crisis, where many scientific results that are published aren’t able to be repeated later due to problems with the research process, including the analysis itself. While many of us are scientists, not statisticians, it’s important to use statistics as part of our research and to improve on the current state of affairs.