Introduction
Real-world problems
The following are a few real-world problems that can be solved using the modified binary search tree pattern.
Order processing milestones
Assume we want to collect stats of the number of orders processed for an e-commerce business. These stats will be presented in a quarterly report using a histogram. To collect data for this presentation, the devs have stored the number of orders for each day rounded down to the nearest significant milestone. For example, when we meet or exceed one million orders, we display “1M+ orders processed.” We continue to display the same message until another milestone is reached. For instance, when nine million orders or more are processed, we display “9M+ orders processed.” Every day, the rounded cumulative number of orders processed is appended to an array.
A modified binary search tree can be used to find the number of days spent jumping from one milestone to the next. We will have an array containing the daily milestones, that resets each quarter, so we start from 0 million.
Let’s say the array is: {0, 1, 1, 2, 2, 2, 3, 4, 4, 4, 5, 5, 6, 7}
. This array shows the stats for fourteen days. Now, we will also have a value representing a certain milestone; suppose it is 4. This means that we need to find when the “4M+ orders processed” milestone occurred. The modified binary approach would return {7, 9}
which means that we remained at the 4M+ milestone from the 7th day to the 9th day. The first day of the quarter is considered the 0th day.
Kth missing gene
A planet has n
genes, numbered 1
to n
. Every species on the planet has a subset of these genes in its DNA sequence. For a given species’ DNA, we want to find the missing gene in a sorted list of genes. The given DNA sequence will be sorted in strictly increasing order.
Since our input is in ascending order, we can use a binary search tree to solve this problem.
Dry-run tables
The following is the implementation of the solution provided for the Ceiling of a Number (medium) problem:
def search_ceiling_of_a_number(arr, key):n = len(arr)if key > arr[n - 1]: # if the 'key' is bigger than the biggest elementreturn -1start, end = 0, n - 1while start <= end:mid = start + (end - start) // 2if key < arr[mid]:end = mid - 1elif key > arr[mid]:start = mid + 1else: # found the keyreturn mid# since the loop is running until 'start <= end', so at the end of the while loop, 'start == end+1'# we are not able to find the element in the given array, so the next big number will be arr[start]return startdef main():print(search_ceiling_of_a_number([4, 6, 10], 6))print(search_ceiling_of_a_number([1, 3, 8, 10, 15], 12))print(search_ceiling_of_a_number([4, 6, 10], 17))print(search_ceiling_of_a_number([4, 6, 10], -1))main()
Students may be encouraged to run through the provided solution with the following sample inputs:
Sample input #1
Input: [1, 3, 8, 10, 15], key = 12
start
is updated when ourkey
is greater thanarr[mid]
.end
is updated when ourkey
is less thanarr[mid]
.
Line number | Loop iteration | start | end | mid | arr[mid] |
6 | - | 0 | 4 | - | - |
7-14 | 1 | 3 | 4 | 2 | 8 |
7-14 | 2 | 4 | 4 | 3 | 10 |
7-14 | 3 | 4 | 3 | 4 | 15 |
18 | - | 4 | - | - | - |
Sample input #2
Students may be encouraged to complete the dry-run with this input:
Input: [4, 6, 10], key = -1
Line number | Loop iteration | start | end | mid | arr[mid] |
6 | - | 0 | 2 | - | - |
7-14 | ... | ... | ... | ... | ... |
Step-by-step solution generation
The first step in our algorithm is to check if our key
is bigger than the last (largest) element in our input array. If yes, we return -1
.
Since our array is sorted in ascending order, the last element is the largest element in the array.
def search_ceiling_of_a_number(arr, key):n = len(arr)print("Input array: ", arr, ", key: ", key, sep = "")print("")print("Length of the input array: ", n, sep = "")print("Last element of the input array: ", arr[n-1], sep = "")if key > arr[n - 1]: # if the 'key' is bigger than the biggest elementprint("\t\t", key, " is greater than ", arr[n-1], " ---> returning -1", sep = "")print("")return -1else:print("\t\t", key, " is not greater than ", arr[n-1], " ---> continue", sep = "")print("")def main():inputArrays = [([4, 6, 10], 6),([1, 3, 8, 10, 15], 12), ([4, 6, 10], 17), ([4, 6, 10], -1)]for i in inputArrays:index = search_ceiling_of_a_number(i[0], i[1])if index != -1 and index is not None:print("\nThe smallest number in ", i[0], " greater than or equal to ", i[1], " is ", i[0][index],", at index ", index, ".", sep = "")else:print("\nThere is no number greater than or equal to", i[1],"in the given array", i[0])print("-"*100+"\n")main()
The next step is to find the ceiling of the given key
. We’ll iterate over our input array and maintain start
, end
, and mid
indices. Here’s a rundown of what we’ll be doing:
- Set
start
as0
,end
as the length of the array - 1, andmid
as the mid-point between thestart
andend
indices. - If
key
is less than the middle element, updateend
withmid - 1
, sincekey
will be in the first half of the array. - If
key
is greater than the middle element, it implies that thekey
will be in the second half of the array. Hence, we updatestart
withmid + 1
. - Repeat the above steps until
start <= end
. - Return
mid
if thekey
is found, else returnstart
, which will be pointing to the next number bigger thankey
.
def search_ceiling_of_a_number(arr, key):n = len(arr)print("Input array: ", arr, ", key: ", key, sep = "")print("")print("Length of the input array: ", n, sep = "")print("Last element of the input array: ", arr[n-1], sep = "")if key > arr[n - 1]: # if the 'key' is bigger than the biggest elementprint("\t\t", key, " is greater than ", arr[n-1], " ---> returning -1", sep = "")print("")return -1else:print("\t\t", key, " is not greater than ", arr[n-1], " ---> continue", sep = "")print("")start, end = 0, n - 1print("Start: ", start, ", end: ", end, sep = "")j = 1while start <= end:print("\tLoop iteration: ", j, sep = "")j+=1mid = start + (end - start) // 2print("\t\tmid: (start+end)/2 = ", mid, ", arr[mid]: ", arr[mid], sep = "")if key < arr[mid]:print("\t\tkey < arr[mid]: ", key, " < ", arr[mid], " ---> update end to search in first half of range", sep = "")end = mid - 1print("\t\tend: ", end, sep = "")elif key > arr[mid]:print("\t\tkey > arr[mid]: ", key, " > ", arr[mid], " ---> update start to search in second half of range", sep = "")start = mid + 1print("\t\tstart: ", start, sep = "")else: # found the keyprint("\t\tKey found ---> returning mid: ", mid, sep = "")return mid# since the loop is running until 'start <= end', so at the end of the while loop, 'start == end+1'# we are not able to find the element in the given array, so the next big number will be arr[start]print("key not present in the array ---> the next big element: ", arr[start], sep = "")print("Returning the index of ", arr[start], ": ", start, sep = "")return startdef main():inputArrays = [([4, 6, 10], 6),([1, 3, 8, 10, 15], 12), ([4, 6, 10], 17), ([4, 6, 10], -1)]for i in inputArrays:index = search_ceiling_of_a_number(i[0], i[1])if index != -1:print("\nThe smallest number in ", i[0], " greater than or equal to ", i[1], " is ", i[0][index],", at index ", index, ".", sep = "")else:print("\nThere is no number greater than or equal to", i[1],"in the given array", i[0])print("-"*100+"\n")main()