Significance of Behavioral Telemetry

Learn the importance of behavioral telemetry in game data science.

When we start working with behavioral telemetry data from games, we’ll see that the raw data collected is often in the order of 50+ features or independent/dependent variables that are measured continuously throughout the duration of a play session. On top of this, there will usually be many play sessions associated with each player. Often, each data point also relies on other data points. For example, quest completion relies on being in the right location, having the right items, etc.

Analyzing such high-dimensional data can be challenging. Especially if we also need to take into account the temporal dimension, for example, in a time-series analysis. There are many interactions and interdependencies between these variables that would also have an effect on the statistical analysis we’ll be performing.

How is behavioral telemetry collected?

Furthermore, it’s important to remember that behavioral telemetry is collected from players as they play. The more information we collect, the sparser the data will likely be across those dimensions because not all participants actually go through all the game spaces, especially in open-world games. This sparsity can make it hard to develop accurate models for specific problems. For example, it would be hard to analyze player movement patterns in an area that few players have visited.

For all these reasons, we have to be strategic about the information we’re collecting. Instead of attempting to deal with such high-dimensional data, a very common strategy is to develop an abstraction from the raw variables to a higher level with fewer variables, which reduces the dimensionality of the data and provides useful information about player behavior. We refer to these variables as features or metrics. We use these two terms interchangeably to mean a variable of interest (or an independent variable, as we called them in the previous chapter) from the abstracted, raw data/measures.

Usage of abstraction

Abstractions can have several purposes. For example, using abstraction methods, we can condense time but keep the sequential nature of the measures, aggregate over the temporal dimension (that is, removing time as a dimension), or develop new abstract variables that are functions of the variables in the raw data, thereby condensing the number of variables into a more manageable set. To take an example, the kill/death ratio is a common feature/metric developed for shooter games. Other good examples of abstractions over raw data points are provided in the list of game metrics discussed previously. Please review this list as it is extensive and can give you an idea of what we are trying to achieve in this chapter.

In this chapter, we’ll introduce the process of creating such features and metrics from raw data. There are many different strategies to accomplish this. These strategies can be summarized into three processes:

  • Feature engineering refers to the process of using domain or expert knowledge to aggregate data and develops new features. Examples of this process are metrics discussed in Introductory chapter. Other examples can include averaging kills per match per player and time spent on each location, where location is defined as an area in the game map.

  • Feature extraction refers to the process of developing new features using statistical techniques from raw measures reducing the number of variables by obtaining a set of principal variables. Therefore, feature extraction derives new features F1,/cdots,FmF_1, /cdots , F_m, which are new variables obtained statistically from the raw variables X1,,XnX_1 , \cdots , X_n . A method that allows us to perform such extraction is Principal Component Analysis (PCA), which we’ll discuss in detail in this chapter. It should be noted that these types of techniques produce features that may not be interpretable by humans.

  • Feature selection This refers to the process of filtering the raw measures and selecting a few that are of interest, thus reducing the number of variables that can be used for further analysis. This process is usually done through statistical methods that allow us to rank or score the importance of features given a prediction or outcome variable, such as whether the player won or not. As opposed to feature extraction, feature selection selects specific variables from the raw variables, X1,/cdots,XnX_1, /cdots , X_n, owing to their importance for modeling a particular relationship with a target variable YY. Therefore, the new variables are a subset of the raw variables, while the feature extraction technique develops new variables from the raw variables.

Chapter overview

In this chapter, we’ll discuss some techniques in detail. We’ll present some of the algorithms used and explain how such algorithms can be used through labs in R. We’ll focus on the latter two techniques, feature extraction and selection, rather than feature engineering. This is due to the fact that feature engineering is a technique that is often game dependent and requires expert knowledge. Moreover, for feature engineering, we mostly use scripting to develop aggregate measures using similar functions to what we’ve discussed in the previous chapters. Therefore, we’ll keep it as an exercise for us to use Virtual Personality Assessment Lab (VPAL) data to engineer some features that may be useful for our analysis goal. This chapter includes the following labs:

  • PCA lab: Focuses on feature extraction with PCA.
  • PCA mix lab: Extends the techniques used in the previous lab to include mixed data: qualitative and quantitative.
  • Feature selection lab: Focuses on feature selection showing forward and backward feature selection methods with example game data.

It should be noted that some of these algorithms are based on machine learning techniques, which we’ll introduce in more detail later in the course. For such cases, we will not delve deeply into the techniques but just introduce them and show how to use them, referring to the relevant chapters for more details. When such algorithms are discussed in later chapters, we recommend coming back to this chapter and considering how this added knowledge impacts our understanding of data abstraction. Before we delve into the subject of this chapter, we’ll first discuss the dataset we’ll be using throughout this chapter for examples and labs.

Get hands-on with 1200+ tech skills courses.