The Node2Vec Algorithm

Learn the Node2Vec algorithm, which is used to generate vector embedding for the nodes of a graph.

We'll cover the following...

What is Node2Vec?

The Node2Vec is an embedding generation algorithm for nodes. It is one of the most famous algorithms of this class and was proposed in 2016.

The Node2Vec algorithm tries to improve over the DeepWalk algorithm by introducing biasA parameter influencing how random walks explore nearby versus distant nodes, affecting the resulting node embeddings. in the random walks. This allows the researcher to choose between prioritizing local relationships between nodes or global characteristics of the graph.

The biased random walk

A biased random walk is governed by two parameters pp and qq, and both can be considered probabilities. A biased random walk differs from a pure random walk because the probabilities of changing the states are not equal.

The parameter pp is used to prioritize breadth-first search (BFS). In this type of search, we look at all elements in the same neighborhood before going deeper. In other words, we try and explore all neighbors of a node before going further in the graph. The image below presents a BFS in a tree graph.

BFS—nodes visited following the same hierarchy level
BFS—nodes visited following the same hierarchy level

The parameter qq is used to prioritize a depth-first search (DFS). In this search, our focus is to go as deep inside the graph as possible, or in other words, as far away from the starting node as possible. This will favor moving to nodes not visited before, therefore favoring a global exploration of the network. The image below presents a DFS in a tree graph.

DFS—nodes visited trying to get to the leaves first
DFS—nodes visited trying to get to the leaves first

Let’s say we just move from node tt to node vv in a random walk. Then, the weight multiplier of the edge weight for every other node in the next step is given by:

where ...