Dota 2 Data

Learn about the characteristics of the Dota 2 dataset used in the rest of the course.

As we start to look into how to do sequence analysis in more detail, we’ll use data from Dota 2 for this chapter, which is similar to the Dotalicious game data we have worked with in the past chapters. In Dota 2, each game pits two teams of five players, referred to as Radiant and Dire, against each other in combat. The game, similar to Capture the flag, is won when one team destroys the other team’s ancient, which is located in their base on the opposite end of the map. The game map consists of three lanes with forested areas between them, a river, and several NPC entities ranging from monsters, such as Roshan, to structures, such as barracks. Each lane is populated by a set of towers that fire beams at enemy entities that get too close. In order to reach the enemy’s base, players must destroy these towers.

Data used

The data we’re using in labs and as examples within the course include data from approximately 200 players, split into 550 sequences based on game segments (described below). The Dota 2 dataset prepared contains the following fields:

  • ProfileId: This is the unique ID per user.

  • PlayersessionId: Unique ID for each session played in the game.

  • Segment: A DOTA 2 game can be divided into three segments of gameplay, early, mid, and late games. We divided each player’s total sequence into three sequences (one for each segment). There is some ambiguity with regard to when a game segment begins and ends within a game; for our purposes, we mark the end of the early game as the moment when the first tower falls, and the end of the mid-game as the moment when the first tier 3 tower falls. Note that not all matches make it to the late game.

  • The sequence of actions for each player: We defined 10 different types of actions based on what the player is doing and whether other team members are in the vicinity. These are explained below.

    • solo: A player has no allied players in their vicinity.

    • fight: A player has encountered at least one enemy player.

    • kill_hero: A player secures a kill against an opponent.

    • teaming: A player has allied players (at least one) in their vicinity.

    • death: A player dies.

    • harrassed_by_opponents: A player encounters at least two enemies.

    • fight_diminishes: The number of opponents in a player’s vicinity decreases.

    • fight_intensifies: The number of opponents in a player’s vicinity increases.

    • team_fight: A player has more than one ally and more than one opponent in their vicinity.

    • full_team_assembly: A player’s entire team is in their vicinity.

These events were abstracted from the game’s low-level telemetry using a script that we developed. The features above were developed from raw data through the process of scripting. We identified that these features are important for our analysis. However, depending on our question, we may want to define different features from the raw data.

It should be noted that the VPAL data, which we used in earlier chapters, can also be converted into sequence data that can be used with the methods discussed in this chapter. However, we'll keep this process aside as an exercise.

Representation of sequence data

Sequences can be represented in the form of a state sequence (STS), in which each state is an action, and thus all actions are represented asA1,A2,,An,A_1, A_2,\dots, A_n, with commas or other delimiters separating them. Some implementations of algorithms, such as sequence mining algorithms, require $a '−1'$ after every state to denote the end of a state and $a '−2'$ to denote the end of a sequence. For example,S11,S21,S31,Sn2S_1 −1, S_2 −1, S_3 −1, \dots S_n −2.

Other possible formats, including state-permanence-sequence (SPS), aren’t used in this chapter but are used by algorithms we may encounter in our analysis or libraries we use, and therefore, we included them in the representations discussed in the below table.

Get hands-on with 1200+ tech skills courses.