Search⌘ K
AI Features

Approach to the Problem

Explore the process of link prediction on social network graphs by training knowledge graph embeddings with PyKEEN. Understand dataset preparation, TransR embedding training, and rank-based evaluation metrics including mean rank, MRR, and Hits@k to assess model performance.

Our approach

After the dataset is built, we can use the pykeen Python library to train a knowledge graph embedding model and evaluate its performance. Let's take a look at the code below:

Python 3.10.4
from pykeen.triples import TriplesFactory
from pykeen.pipeline import pipeline
# create triples factory from the dataset
tf = TriplesFactory.from_labeled_triples(df.values)
# split into training and testing
training, testing = tf.split([0.8, 0.2], random_state=42)
# train using pipeline method
result = pipeline(
training=training,
testing=testing,
model = "TransR",
model_kwargs=dict(embedding_dim=128),
optimizer = "adam",
training_kwargs=dict(num_epochs=20, use_tqdm_batch=False),
random_seed=42,
device='cpu',
negative_sampler = 'bernoulli',
negative_sampler_kwargs = dict(num_negs_per_pos = 3))
# retrieve results
result_df = result.metric_results.to_df()
print(result_df)

Note: At the end of the output, that is not the errors; it measures the time the evaluation takes.

Let’s look at the code explanation below:

  • Lines 5–7: Create TriplesFactory from the dataset and split it into training and testing sets in a ratio of 80:20.

  • Lines 10–20: Train knowledge graph embeddings using the TransR algorithm and PyKEEN's pipeline method. This method performs the evaluation of embeddings and outputs the results.

  • Line 23: Saves the results to a DataFrame.

  • Line 25: Prints the DataFrame.

The resulting DataFrame shows different metrics and their values.

Evaluation

PyKEEN's pipeline method performs evaluations and provides results in an easy-to-read manner. The results in the DataFrame can look as follows (values might be different since the model was trained with different hyperparameters):

PyKEEN pipeline result metrics
PyKEEN pipeline result metrics

Let's find out what these metrics mean.

Process

By default, PyKEEN uses a rank-based evaluation, which is standard for link prediction tasks. After generating the knowledge graph embeddings using the TransR algorithm, the pipeline method performs rank-based evaluation on the ...