What is a knowledge graph?
A knowledge graph is a structured collection of information. By exposing facts from a knowledge base and converting them to entities and relationships, a knowledge graph embodies the information held within that knowledge base. They portray the extracted facts in the Subject-Predicate-Object format.
The following diagram illustrates a very simple knowledge graph. Note how real-world entities (for convenience, small icons have been shown in the place of circles) are the nodes of this graph, which are connected by edges. These edges represent the relationships between them. For example, Maria “watched’’ the movie “Jojo Rabbit.” Although the graph below is very simple, we can use knowledge graphs to model complex information in a human and machine-readable format.
How do knowledge graphs work?
The following diagram shows the steps involved in constructing a knowledge graph.
We need to follow certain steps to create one:
Data acquisition: We must acquire data from databases, files, and websites.
Identify entities: After collecting data, we need to identify the entities in the data.
Extracting relationship: Next, we need to determine the relationships between the identified entities.
Develop ontology: Then, we create a proper structure called ontology to organize the properties and relations between the entities.
Store data: We store the knowledge graph in a database that can handle graph data.
Querying and inference: We use graph query language to search and explore relationships in the graph data. We can also carry out more advanced tasks, like identifying new connections and pinpointing any inconsistencies within our knowledge graph.
Advantages of knowledge graphs
Knowledge graphs facilitate data integration by linking information from diverse sources, enabling structured data sharing across organizations. For example, a knowledge graph could connect customer data from a CRM system with product data from an inventory database in an e-commerce company.
They improve the comprehension of a knowledge base by presenting entities and their relationships in a format that is easily understandable by both humans and machines. For instance, a knowledge graph can represent the connections between symptoms, diseases, and treatments in the healthcare domain.
Knowledge graphs enhance search functionality by providing more relevant and accurate results based on the relationships between entities. For instance, a search for “healthy recipes” could yield better results by considering ingredients, nutritional values, and user preferences within the knowledge graph.
They offer flexibility as they can be tailored to suit the requirements of various applications. For example, a knowledge graph in the financial sector can be customized to handle diverse data types such as market trends, customer profiles, and regulatory information.
Knowledge graphs support inference tasks, enabling the discovery of new relationships and the identification of data inconsistencies. For example, analyzing the connections between weather patterns, crop yields, and agricultural practices could reveal insights for optimizing farming techniques.
Knowledge graphs are scalable, making them suitable for handling large-scale applications and massive datasets. For instance, a knowledge graph powering a smart city infrastructure can efficiently manage diverse data streams from sensors, transportation systems, and public services.
Limitations of knowledge graphs
Knowledge graphs have limitations related to the generalization of entities, particularly when they cross boundaries.
Distinguishing between similar entities, such as "Washington," the state, and "Washington," the person, can be challenging within knowledge graphs.
Setting and maintaining boundaries in different scenarios can also be difficult.
Knowledge graphs can become cluttered due to long relations between entities, often consisting of multi-word phrases.
Keeping track of relations becomes increasingly challenging as the complexity of the graph grows, especially with larger datasets.
Complex knowledge graphs containing multiple relations can confuse both humans and machines. In such cases, relational databases might offer a better alternative to knowledge graphs.
Real-world applications of knowledge graphs
Now, let’s look at some real-world applications of knowledge graphs.
Semantic search
Knowledge graphs are capable enough to understand the context and relations between entities on the web. Therefore, they improve search engine results, providing more relevant search results to users.
Chatbot
Knowledge graphs can recognize relevant information and relationships between entities. Therefore, they are suitable for chatbot applications, facilitating question-answering.
Fraud detection
Knowledge graphs can also identify strange behavior and relationships within large datasets, enabling them to pinpoint fraud and security threats. This may include suspicious transactions, fake or hacked accounts, and abnormal behavior.
Biomedical research
In biomedical research, knowledge graphs can model complex relationships between proteins, genes, and drugs. Therefore, they help in drug development and provide new insights to researchers.
Implementation
The following Python file shows how to create a knowledge graph from drug testing data. Click the “Run’’ button in the widget below and play with the Jupyter Notebook code once the notebook launches. Wait patiently because it might take some time for the app to respond.
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
head_node_values = ['exp_drug_A', 'exp_drug_B', 'exp_drug_C',
'exp_drug_D', 'exp_drug_A', 'exp_drug_C',
'exp_drug_D', 'exp_drug_E', 'exp_gene_13',
'exp_gene_2','exp_gene_666', 'exp_gene_4',
'exp_gene_6', 'exp_gene_2', 'exp_gene_666',
'exp_gene_4']
relationship_values = ['will treat', 'will treats', 'will treat',
'will treat', 'will inhibit', 'will inhibit',
'will inhibit', 'will inhibit', 'is associated with',
'is associated with', 'is associated with', 'is associated with',
'is associated with', 'interacts with', 'interacts with',
'interacts with']
tail_nodes_values = ['covid', 'back pain', 'lung cancer',
'headache', 'exp_gene_13', 'exp_gene_2', 'exp_gene_4',
'exp_gene_20', 'weight gain', 'cardiac arrest',
'sore throat', 'bleeding', 'brain tumor',
'exp_gene_13', 'exp_gene_20', 'exp_gene_6']
educatives_knowledge_graph = nx.Graph()
data_dictionary = {'head_node': head_node_values, 'relationship': relationship_values, 'tail_node': tail_nodes_values}
educatives_dataframe = pd.DataFrame(data_dictionary)
educatives_dataframe
for i, my_row in educatives_dataframe.iterrows():
educatives_knowledge_graph.add_edge(my_row['head_node'], my_row['tail_node'], label=my_row['relationship'])
educative_position = nx.spring_layout(educatives_knowledge_graph, seed=47, k=3.6)
plt.figure(figsize=(11, 10))
nx.draw(educatives_knowledge_graph, educative_position, with_labels=True, node_size=666, node_color='green', edge_color='black', alpha=0.9)
nx.draw_networkx_edge_labels(educatives_knowledge_graph, educative_position, edge_labels=nx.get_edge_attributes(educatives_knowledge_graph, 'label'), label_pos=0.5, verticalalignment='baseline')
plt.show()
Code explanation
Line 1–3: First of all, we import the required packages; in our case, we’ll require
pandasto create a data frame from the knowledge graph’s data,networkxto create the graph, andmatplotlibto display it.Line 5–24: There are three lists: one containing the data for the head nodes called
head_node_values, the second containing the tail node values calledtail_node_values, and last, the relationship from the head node to the tail node, stored within therelationship_valueslist.Line 25–31: We create an empty, undirected graph with the
Graphclass. Then, we instantiate a dictionary that stores column names of the lists declared on lines 5, 12, and 19 as keys and their respective list variables as values. On line 27, we create a data frame from this list and display our medicine testing data. Note each row in our data frame represents a triple in our knowledge graph — the head connected to the tail via a relationship. Lastly, utilizing theadd_edgefunction, we iterate over this data frame and add each row as an edge,Line 33–37: We utilize the
spring_layoutfunction to set the positions of nodes in our graph. The argumentseedsis used to set a random state for deterministic node layout. At the same time,ksets the distance between each node. We create a new figure using theplt.figure()method and then create the graph usingnx.draw()function by passing it theeducatives_knowledge_graphgraph andeducative_position, which stores the node positions. The rest of the arguments passed to it are self-explanatory except for alpha, which sets the opacity of the graph. Using thenx.draw_networkx_edge_labelsmethod, we add the edge labels to the graph by extracting them fromget_edge_attributesmethod and aligning them withlabel_posandverticalalignmentarguments. Finally, we visually display the graph withplt.show().
Free Resources