Summary: Statistics and Probability Theory

Get a recap of some of the topics that we've learned in this chapter.

In this chapter, we discussed descriptive statistics as well as different types of inferential statistics, including ANOVA and t-tests. We discussed that ANOVA is based on GLMs, which will be discussed in further detail in later chapters. We also introduced probabilistic measures and distributions, which will be used in future chapters. Let's go through some of the important terms discussed in the chapter.

Descriptive statistics

Descriptive statistics are often used to understand the data itself and how it is distributed or spread. It is an important step to know what techniques from the tools of statistical analysis methods to use based on the shape and structure of the data.

Measure of centrality

The measure of centrality is a measure of the centrality of a distribution. There are multiple ways to calculate the centrality of such a distribution, including mean, median, and mode.

Measure of spread

A measure of spread or dispersion is often used to describe how the data is varied within the sample. This measure is important because it allows us to understand how the different data points are laid out within our distribution for a particular variable. We can use various methods to compute this measure: range, variance, and standard deviation.

Inferential statistics

In inferential statistics, we’re looking for conclusions beyond the immediate data, such as inferring how version A of a game we’re working on, can impact players as compared to version B. This means performing simple A/B testing and comparing the effect of a simple game design change on engagement, retention, or performance.


The t-test is a method used to mostly compare two groups. The method specifically compares the means of the two groups and determines if the actual means are statistically different. This type of analysis will definitely involve understanding the spread of the data in the two populations and then determining how close they are, based on the mean and the spread in the data.


ANOVA is used to understand the sources of variation in the dependent variable, that is, which independent variables have how much impact on the variance as observed in the dependent variables.


Probability is used to capture the changes, in numerical terms, of how likely it is that a certain event happens.

Get hands-on with 1200+ tech skills courses.