Introduction
Real-world problems
The following are a few real-world problems that can be solved using the topological sort pattern.
Process scheduling
When a system is booted, the operating system needs to run a number of processes. Some processes have dependencies, which are specified using ordered pairs of the form (a, b)
; this means that process b
must be run before process a
. Some processes don’t have any dependencies, meaning they don’t have to wait for any processes to finish. Additionally, there cannot be any circular dependencies between the processes like (a, b)(b, a)
. In order to successfully start the system, the operating system needs to select an ordering to run the processes. The processes should be ordered in such a way that whenever a process is scheduled all of its dependencies are already met.
The processes can be represented in the form of a graph where the vertices are the processes and the edges represent their dependency relationships.
For example, in the following graph, there are six processes, numbered 1
through 6
. Here, arrows are used to represent dependencies. For example, process 3
depends on both processes 2
and 5
.
From the above example, we get the order: P6 ➔ P4 ➔ P1 ➔ P5 ➔ P2 ➔ P3
. Another possible ordering of the above processes can be P6 ➔ P4 ➔ P5 ➔ P1 ➔ P2 ➔ P3
. This order of graph vertices can be obtained through topological sort.
Find dictionary
We use a network protocol that encrypts all application messages using a proprietary scheme. The encryption scheme has a unique property that the sequence of encrypted messages in a session appears to be in sorted order according to a secret dictionary. However, the dictionary is not transmitted for security purposes.
Before the sender starts transmitting actual messages, it sends several encrypted training messages to the receiver. The sender guarantees that the training messages will follow the lexicographic order according to the unknown dictionary.
The receiver must reverse engineer the training messages and reconstruct the dictionary for future communication with the sender. If the order of the messages is invalid, the receiver generates an empty dictionary and asks the sender to retransmit the training messages.
For simplicity’s sake, we can assume that the encrypted contents of the messages only consist of English lowercase letters.
We can essentially map this problem to a graph problem, where we:
- Extract the necessary information to identify the dependency rules from the messages.
- With the gathered information, we can put these dependency rules into a directed graph with the letters as nodes and the dependencies (order) as the edges.
- Lastly, we can sort the graph nodes topologically to generate the letter ordering (dictionary).
Consider the following illustration for first letter ordering:
We can assume the first letters in the messages are in alphabetical order.
Looking at the letters above, we know the relative order of these letters but do not know how these letters fit in with the rest of the letters. To get more information, we’ll need to look further into our English dictionary analogy. The word dirt
comes before dorm
. This is because we look at the second letter when the first letter is the same. In this case, i
comes before o
in the alphabet.
We can apply the same logic to our encrypted messages and look at the first two messages, mzsor
and mqov
. As the first letter is the same in both the messages, we look at the second letter. The first message has z
, and the other second one has q
. Therefore, we can safely say that z
comes before q
. We now have two fragments of the letter-order:
["m", "x", "s", "r"], ["z", "q"]
Let’s have a look at all the relations we can extract by comparing adjacent messages:
The above rules can be converted into a graph as follows:
Dry-run templates
The following is the implementation of the solution provided for the Topological Sort (medium) problem:
from collections import dequedef topological_sort(vertices, edges):sortedOrder = []if vertices <= 0:return sortedOrder# a. Initialize the graphinDegree = {i: 0 for i in range(vertices)} # count of incoming edgesgraph = {i: [] for i in range(vertices)} # adjacency list graph# b. Build the graphfor edge in edges:parent, child = edge[0], edge[1]graph[parent].append(child) # put the child into it's parent's listinDegree[child] += 1 # increment child's inDegree# c. Find all sources i.e., all vertices with 0 in-degreessources = deque()for key in inDegree:if inDegree[key] == 0:sources.append(key)# d. For each source, add it to the sortedOrder and subtract one from all of its children's in-degrees# if a child's in-degree becomes zero, add it to the sources queuewhile sources:vertex = sources.popleft()sortedOrder.append(vertex)for child in graph[vertex]: # get the node's children to decrement their in-degreesinDegree[child] -= 1if inDegree[child] == 0:sources.append(child)# topological sort is not possible as the graph has a cycleif len(sortedOrder) != vertices:return []return sortedOrderdef main():print("Topological sort: " +str(topological_sort(4, [[3, 2], [3, 0], [2, 0], [2, 1]])))print("Topological sort: " +str(topological_sort(5, [[4, 2], [4, 3], [2, 0], [2, 1], [3, 1]])))print("Topological sort: " +str(topological_sort(7, [[6, 4], [6, 2], [5, 3], [5, 4], [3, 0], [3, 1], [3, 2], [4, 1]])))main()
Sample input #1
Input: Vertices=4, Edges=[3, 2], [3, 0], [2, 0], [2, 1]
Note: The column Outer loops represents the loops in our algorithm and their iteration number. For example, column entry
while: 2
represents the second iteration of thewhile
loop.Since, there are 3 loops, the column Line number specifies the loop we’re considering. The column Inner loop iteration represents the nested
for
loop (line 30 - line 33).
Line number | Outer loops | Inner loop iteration | graph | parent | child | inDegree[child] | inDegree | sources | vertex | sortedOrder |
5-11 | {0: [], 1: [], 2: [], 3: []} | {0: 0, 1: 0, 2: 0, 3: 0} | [ ] | |||||||
14-17 | for: 1 | {0: [], 1: [], 2: [], 3: [2]} | 3 | 2 | {0: 0, 1: 0, 2: 1, 3: 0} | [ ] | ||||
14-17 | for: 2 | {0: [], 1: [], 2: [], 3: [2, 0]} | 3 | 0 | {0: 1, 1: 0, 2: 1, 3: 0} | [ ] | ||||
14-17 | for: 3 | {0: [], 1: [], 2: [0], 3: [2, 0]} | 2 | 0 | {0: 2, 1: 0, 2: 1, 3: 0} | [ ] | ||||
14-17 | for: 4 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | 2 | 1 | {0: 2, 1: 1, 2: 1, 3: 0} | [ ] | ||||
21-23 | for: 1 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | {0: 2, 1: 1, 2: 1, 3: 0} | [ ] | [ ] | |||||
21-23 | for: 2 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | {0: 2, 1: 1, 2: 1, 3: 0} | [ ] | [ ] | |||||
21-23 | for: 3 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | {0: 2, 1: 1, 2: 1, 3: 0} | [ ] | [ ] | |||||
21-23 | for: 4 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | {0: 2, 1: 1, 2: 1, 3: 0} | [3] | [ ] | |||||
27-33 | while: 1 | 1 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | 2 | 0 | {0: 2, 1: 1, 2: 0, 3: 0} | [2] | 3 | [3] | |
27-33 | while: 1 | 2 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | 0 | 1 | {0: 1, 1: 1, 2: 0, 3: 0} | [2] | 3 | [3] | |
27-33 | while: 2 | 1 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | 0 | 0 | {0: 0, 1: 1, 2: 0, 3: 0} | [0] | 2 | [3, 2] | |
27-33 | while: 2 | 2 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | 1 | 0 | {0: 0, 1: 0, 2: 0, 3: 0} | [0, 1] | 2 | [3, 2] | |
27-33 | while: 3 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | {0: 0, 1: 0, 2: 0, 3: 0} | [1] | 0 | [3, 2, 0] | ||||
27-33 | while: 4 | {0: [], 1: [], 2: [0, 1], 3: [2, 0]} | {0: 0, 1: 0, 2: 0, 3: 0} | [ ] | 1 | [3, 2, 0, 1] |
Sample input #2
Students may be encouraged to fill the dry-run table with the following input:
Input: Vertices=5, Edges=[4, 2], [4, 3], [2, 0], [2, 1], [3, 1]
Line number | Outer loops | Inner loop iteration | graph | parent | child | inDegree[child] | inDegree | sources | vertex | sortedOrder |
5-11 | {0: [], 1: [], 2: [], 3: [], 4: []} | {0: 0, 1: 0, 2: 0, 3: 0, 4: 0} | [ ] | |||||||
14-17 | for: 1 | {0: [], 1: [], 2: [], 3: [], 4: [2]} | 4 | 2 | {0: 0, 1: 0, 2: 1, 3: 0, 4: 0} | [ ] | ||||
14-17 | for: 2 | {0: [], 1: [], 2: [], 3: [], 4: [2, 3]} | 4 | 3 | {0: 0, 1: 0, 2: 1, 3: 1, 4: 0} | [ ] | ||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Step-by-step solution construction
The first step is to initialize the graph with vertices and empty adjacency lists. We also maintain a dictionary inDegree
that will store the number of incoming edges for each vertex. However, in this step, we’ll initialize it to 0.
from collections import dequedef topological_sort(vertices, edges):print("Given edges: ", edges, ", Vertices: ", vertices, sep = "")print("")sortedOrder = []if vertices <= 0:return sortedOrder# a. Initialize the graphprint("Initializing inDegree")inDegree = {}for i in range(vertices):print("\tLoop iteration: ", i+1, sep = "")print("\t\tAdding vertex ", i, " to inDegree and initialising it to 0", sep = "")inDegree[i] = 0print("\t\tinDegree: ", inDegree, sep = "")print("")print("")print("Initializing graph")graph = {}for i in range(vertices):print("\tLoop iteration: ", i+1, sep = "")print("\t\tAdding vertex ", i, " to the graph and initialising it with an empty adjacency list", sep = "")graph[i] = []print("\t\tgraph: ", graph, sep = "")print("")print("")return sortedOrderdef main():inputgraphs = [(4, [[3, 2], [3, 0], [2, 0], [2, 1]]),(5, [[4, 2], [4, 3], [2, 0], [2, 1], [3, 1]]),(7, [[6, 4], [6, 2], [5, 3], [5, 4], [3, 0], [3, 1], [3, 2], [4, 1]]) ]for i in inputgraphs:print("Topological sort: ", topological_sort(i[0], i[1]))print("-"*100+"\n")main()
Next, we build the graph. We put the child into its parent’s adjacency list and increment the child’s indegree by 1. This implies that there is an edge going from the parent to the child, that is, parent -> child
.
from collections import dequedef topological_sort(vertices, edges):print("Given edges: ", edges, ", Vertices: ", vertices, sep = "")print("")sortedOrder = []if vertices <= 0:return sortedOrder# a. Initialize the graphprint("Initializing inDegree")inDegree = {}for i in range(vertices):inDegree[i] = 0print("\tinDegree: ", inDegree, sep = "")print("")print("Initializing graph")graph = {}for i in range(vertices):graph[i] = []print("\tgraph: ", graph, sep = "")print("")print("Building the graph")k = 1for edge in edges:print("\tLoop iteration: ", k, sep ="")k+=1parent, child = edge[0], edge[1]print("\t\tparent: ", parent, ", child: ", child, sep = "")graph[parent].append(child) # put the child into it's parent's listprint("\t\tAdding child to the parent's list: ", graph, sep = "")temp = inDegree[child]inDegree[child] += 1 # increment child's inDegreeprint("\t\tIncrementing child's inDegree: ", temp, " + 1 --> ", inDegree[child], sep = "")print("\t\tinDegree: ", inDegree)print("")print("")return sortedOrderdef main():inputgraphs = [(4, [[3, 2], [3, 0], [2, 0], [2, 1]]),(5, [[4, 2], [4, 3], [2, 0], [2, 1], [3, 1]]),(7, [[6, 4], [6, 2], [5, 3], [5, 4], [3, 0], [3, 1], [3, 2], [4, 1]]) ]for i in inputgraphs:print("Topological sort: ", topological_sort(i[0], i[1]))print("-"*100+"\n")main()
The next step is to find all the sources in the graph, that is, vertices with inDegree = 0
. Since there are no edges going in these vertices, we can assume they occupy the first level of our graph.
We’ll store these sources in a dequeue()
variable, source
.
from collections import dequedef topological_sort(vertices, edges):print("Given edges: ", edges, ", Vertices: ", vertices, sep = "")print("")sortedOrder = []if vertices <= 0:return sortedOrder# a. Initialize the graphprint("Initializing inDegree")inDegree = {}for i in range(vertices):inDegree[i] = 0print("\tinDegree: ", inDegree, sep = "")print("")print("Initializing graph")graph = {}for i in range(vertices):graph[i] = []print("\tgraph: ", graph, sep = "")print("")print("Building the graph")k = 1for edge in edges:print("\tLoop iteration: ", k, sep ="")k+=1parent, child = edge[0], edge[1]print("\t\tparent: ", parent, ", child: ", child, sep = "")graph[parent].append(child) # put the child into it's parent's listprint("\t\tgraph: ", graph, sep = "")temp = inDegree[child]inDegree[child] += 1 # increment child's inDegreeprint("\t\tinDegree: ", inDegree)print("")print("")print("Finding the sources")sources = deque()l = 1for key in inDegree:print("\tLoop iteration: ", l, sep = "")l+=1if inDegree[key] == 0:print("\t\tinDegree of vertex ", key, " = ", inDegree[key], " --> vertex is a source", sep = "")sources.append(key)print("\t\tsources: ", sources, sep = "")else:print("\t\tinDegree of vertex ", key, " = ", inDegree[key], " --> not 0, hence, not a source", sep = "")print("")print("")return sortedOrderdef main():inputgraphs = [(4, [[3, 2], [3, 0], [2, 0], [2, 1]]),(5, [[4, 2], [4, 3], [2, 0], [2, 1], [3, 1]]),(7, [[6, 4], [6, 2], [5, 3], [5, 4], [3, 0], [3, 1], [3, 2], [4, 1]]) ]for i in inputgraphs:print("Topological sort: ", topological_sort(i[0], i[1]))print("-"*100+"\n")main()
Now that our setup is complete, we’ll get into the actual algorithm to get the topological sorting. The key idea is to re-use the process of identifying sources. If we look at this graph again:
Supposing we were to remove the first set of sources {6} from the graph, the next set of sources would be {4}. If we then removed these from the graph, the next set of sources would be {1, 5}, and so on. Each of these sets is like a layer in the graph that comes into view when the previous layer is removed.
For each source, we’ll add it to an array sortedOrder
and, to simulate the idea of removing it from the graph, subtract one from all of its children’s in-degrees. If a child’s in-degree becomes zero, we add it to the sources
queue.
The in-degrees are decremented since now the child has one less edge coming into it.
If the length of sortedOrder
is not equal to the number of vertices, the graph has a cycle and the program returns an empty array.
from collections import dequeimport copydef topological_sort(vertices, edges):print("Given edges: ", edges, ", Vertices: ", vertices, sep = "")print("")sortedOrder = []if vertices <= 0:return sortedOrder# a. Initialize the graphprint("Initializing inDegree")inDegree = {}for i in range(vertices):inDegree[i] = 0print("\tinDegree: ", inDegree, sep = "")print("")print("Initializing graph")graph = {}for i in range(vertices):graph[i] = []print("\tgraph: ", graph, sep = "")print("")print("Building the graph")k = 1for edge in edges:print("\tLoop iteration: ", k, sep ="")k+=1parent, child = edge[0], edge[1]print("\t\tparent: ", parent, ", child: ", child, sep = "")graph[parent].append(child) # put the child into it's parent's listprint("\t\tgraph: ", graph, sep = "")temp = inDegree[child]inDegree[child] += 1 # increment child's inDegreeprint("\t\tinDegree: ", inDegree)print("")print("")print("Finding the sources")sources = deque()l = 1for key in inDegree:print("\tLoop iteration: ", l, sep = "")l+=1if inDegree[key] == 0:print("\t\tinDegree of vertex ", key, " = ", inDegree[key], " --> vertex is a source", sep = "")sources.append(key)print("\t\tsources: ", sources, sep = "")else:print("\t\tinDegree of vertex ", key, " = ", inDegree[key], " --> not 0, hence, not a source", sep = "")print("")print("")print("Getting the topological ordering")m = 1while sources:print("\tOuter loop iteration: ", m)m+=1tempsource = deque()tempsource = copy.copy(sources)vertex = sources.popleft()print("\t\tPopping from sources: ", tempsource, " --> ", sources, sep = "")print("\t\tvertex: ", vertex, sep = "")temporder = []temporder = copy.copy(sortedOrder)sortedOrder.append(vertex)print("\t\tAppending to sortedOrder: ", temporder, " --> ", sortedOrder, sep = "")n = 1for child in graph[vertex]: # get the node's children to decrement their in-degreesprint("\t\t\tInner loop iteration: ", n)n+=1print("\t\t\t\tchild: ", child, sep="")print("\t\t\t\tDecrementing the indegree of vertex ", child, ": ", inDegree[child], " --> ", inDegree[child] - 1, sep = "")inDegree[child] -= 1if inDegree[child] == 0:print("\t\t\t\tindegree of ", child, " = 0", sep = "")temps = []temps = copy.copy(sources)sources.append(child)print("\t\t\t\tAppending to sources: ", temps, " --> ", sources, sep = "")print("")print("")# topological sort is not possible as the graph has a cycleif len(sortedOrder) != vertices:print("Topological sort is not possible as the graph has a cycle.")return []return sortedOrderdef main():inputgraphs = [(4, [[3, 2], [3, 0], [2, 0], [2, 1]]),(5, [[4, 2], [4, 3], [2, 0], [2, 1], [3, 1]]),(7, [[6, 4], [6, 2], [5, 3], [5, 4], [3, 0], [3, 1], [3, 2], [4, 1]]) ]for i in inputgraphs:print("Topological sort: ", topological_sort(i[0], i[1]))print("-"*100+"\n")main()