Similarity is an important concept in data science in general. Being able to compare instances and determine how similar or dissimilar they are opens up a lot of possibilities to improve our analysis and prescriptions.

Defining similarity in graphs

Defining similarity in graphs can come in several shapes. We can define similarity in at least three levels:

  • Node similarity

  • Edge similarity

  • Graph/subgraph similarity

Let’s explore how each one of them can be defined.

Node similarity

Node similarity tries to answer if two nodes are similar in some sort of way. Notice that this is a tricky thing to define. Saying a node is similar to another can have a lot of definitions.

One node can be similar to another one if they have the same centrality measures. Another definition can be that nodes are similar if they’re linked to the same set of other nodes. Yet another way to say that nodes are similar is if the amount of information that passes through them is similar. There is no single definition, but each definition can be useful depending on our objectives.

By having some measure of similarity between nodes, we can try to make some inferences:

  • If node AA is similar to BB and node AA likes action movies, maybe node BB will also like it.

  • If node XX is similar to node YY, maybe they’ll generate an edge between them in the future.

  • If node ZZ has a lot of common friends with node LL, maybe they know each other and we should recommend they send a friendship request to each other.

  • If node AA commits fraudulent activity and node BB is very similar to him, maybe we should also keep an eye on node BB for fraudulent activity.

Answering these types of questions can generate a lot of value in our analysis, justifying why we should try to look for similarities between nodes.

Edge similarity

Edge similarity is not so common in complex network analysis. Usually, one looks for similarities between nodes and then estimates if an edge should be created between those nodes.

However, given some specific types of modeling, we can assert some phenomena using edge similarities.

We can say that two edges are similar to each other if they represent, besides the same phenomena, a similar set of characteristics of that phenomena.

For example, if we have a graph of companies and workers, the similarity between the two edges can be related to how similar the work type of those workers at those companies is.

Graph similarity

Another type of similarity that can be defined is graph similarity.

How do we say a graph is similar to another graph? We can say that two graphs are similar if:

  • They have the same number of nodes and edges.

  • They have the same distribution of centrality measures.

  • They are isomorphic, meaning they’re the same graph, with their nodes named differently.

Some insights can be generated when we think in terms of graph similarity for some types of complex networks:

  • If a given protein can be used to make some medicine, similar proteins can also be tested for new medicines.

  • If a graph of molecules has some characteristics, such as electric response or magnetic response, similar molecules might also have those characteristics.

Defining similarity between graphs is very important but is less common than finding node similarities.

Let’s explore now the concept of equivalence and how it can be used to assess the similarity of nodes.

Structural equivalence

Structural equivalence is the idea that two nodes should be similar if they share a lot of the same neighbors. So, if nodes EE and FF and HH and II are connected to exactly the same set of nodes, such as in the image below, they are as equivalent as they can be.

Get hands-on with 1200+ tech skills courses.