Approach to the Problem

Explore how to address node classification on a biological graph by creating custom data handlers and implementing graph neural networks using PyTorch Geometric. Understand the process of building edge indices, preparing node features, splitting datasets, and training a Chebyshev convolutional network. Learn to evaluate model performance with accuracy metrics to improve prediction reliability.

We'll cover the following...

Our approach
Custom data handler
Graph neural network
Evaluation

Python 3.10.4

import torch
from torch_geometric.data import Data
import pandas as pd
from sklearn.model_selection import train_test_split
# create edge index of the graph
edge_index = torch.tensor(list(G.edges()), dtype=torch.long).t().contiguous()
# create the edge weight tensor
edge_weights = [G[u][v]['contact'] for u, v in list(G.edges())]
edge_weight = torch.tensor(edge_weights, dtype=torch.float)
# create node features 
# Create a dataframe from the graph nodes
df = pd.DataFrame(dict(G.nodes(data=True))).T
# convert selected features to tensor
node_features = torch.tensor(df[['tested','symptoms',
                                 'vaccinated','mobility']].astype(float).values,
                             dtype=torch.float)
# labels
node_labels = df.label.map({'infected': 1, 'not infected': 0})
y = torch.from_numpy(node_labels.values).type(torch.long)
# create train and test masks
X_train, X_test, y_train, y_test = train_test_split(pd.Series(G.nodes()), 
                                                    node_labels,
                                                    stratify = node_labels,
                                                    test_size=0.20, 
                                                    random_state=56)
n_nodes = G.number_of_nodes()
train_mask = torch.zeros(n_nodes, dtype=torch.bool)
test_mask = torch.zeros(n_nodes, dtype=torch.bool)
train_mask[X_train.index] = True
test_mask[X_test.index] = True
# create torch_geometric Data object
data = Data(x=node_features, edge_index=edge_index, edge_weight=edge_weight,
            y=y, train_mask=train_mask, test_mask=test_mask,
            num_classes = 2, num_features=len(node_features))
print(data)

Let’s look at the code explanation below:

Line 7: Creates an edge index of the graph, which is the default input used in this library.
Lines 10–11: Create an edge weight tensor using the contact details of the graph.
Line 15: Creates a DataFrame of all the node features.
Lines 18–20: Select relevant features and convert them into a PyTorch tensor.
Line 23–24: Create a tensor of node labels and change the categorical variables into numerical ones.
Lines 27–31: Split the nodes and labels into training and testing sets in a ratio of 80:20 using a stratified split. This ensures equal proportions of infected and not infected cases in both sets. ...

1.About the Course

2.Introduction to Graph Theory

3.Graph Embeddings

4.Supervised and Unsupervised Graph ML

5.Graph Neural Networks

Project

6.Knowledge Graph

7.Knowledge Graph Embeddings

8.Case Study: Link Prediction on a Social Network Graph

9.Case Study: Node Classification on a Biological Graph

10.Appendix

Approach to the Problem

Our approach

Custom data handler