An Example of K-Means From the Wild: Tera Online

Learn about a dataset taken from TERA, its normalization, scaling, and k-means results.

To contextualize the use of the k-means method within game data science, we adopt an example of using k-means clustering to develop player profiles from TERA OnlineTERA stands for The Exiled Realm of Arborea behavioral telemetry. It’s a straightforward, simple example of how cluster analysis can be utilized to build behavioral profiles in games. TERA is an MMORPG that was released by Enmasse Entertainment in South Korea in January 2011. The game was released in North America/Europe in the year 2012. The game is currently free-to-play. It is, at the time of writing, still an active game. It has typical MMORPG features, such as a questing system, crafting, player vs. player action, as well as an integrated economy. Players generate one or more characters, which fall into one of seven races (e.g., Aman, Baraka, or Castanic). In addition, players choose a class (e.g., Warrior, Lancer, or Berserker), each tuned to specific roles in the game (e.g., having a high damage output or being able to absorb high amounts of damage).

Dataset

The dataset from TERA is from the game’s open beta (character levels 1–32 only) and contains the following behavioral variables (or features in data mining terminology):

  • Quests completed: This is the number of quests completed.

  • Friends: This is the number of friends in the game.

  • Achievements: This is the number of achievements earned.

  • Skill levels: This is the level in the mining and plants skills, respectively.

  • Monster kills: This is the number of AI-controlled enemies killed by the character (combining small, medium, and large monsters in one feature).

  • Deaths by monsters: This is the number of times AI-controlled enemies have killed the character.

  • Total items looted: This is the total number of items the character has picked up during the game.

  • Auctions house use: This is the combined number of times the character has either created an auction or purchased something from an auction.

  • Character level: This ranges from level 1 to 32. In this example, we’ll focus on level 32 players (if we just used all possible players, the cluster analysis would neatly give us clusters that are level-dependent, given how the values of the different variables change with character level, that is, a level 32 character will have completed, say 1000 quests, where a level 1 character will have completed 2).

Data preparation and analysis

Behavioral telemetry can suffer from quality problems. Incomplete records were removed, and various types of analyses were performed on the data to find any outliers and to check the distribution of the data for each feature.

Get hands-on with 1200+ tech skills courses.