Introduction
Real-world problems
Following are the real-world problems that can be solved via Two Heaps pattern:
Find Median Age
Netflix wants to add a feature to fetch relevant content based on the age of the users for a specific country or region. For this, we use the median age of users and the preferred content for users that fall into that specified category.
Because the number of users is constantly increasing on the Netflix platform, we need to figure out a structure or design that updates the median age of users in real-time. We will have an array that constantly receives age values, and we will output the median value after each new data point.
Let’s understand this better with an illustration:
Calculate Median of Buffering Events
Thousands of clients are expected to be associated with a Netflix server at any given time. User session statistics such as packet drops and buffering events in the last one-minute interval are relayed back to the Netflix servers to monitor the session quality and adjust the streaming rate. Due to a large number of users, this amounts to an immense volume of data. Limited memory per user session is available due to the problem scale.
At any time, we may store the last k
values for a particular measure (such as the number of buffering events) of a specific user session. As the next value arrives, the oldest value is removed from the memory. In other words, all the values for a session are not available but are accessible in a sliding window fashion.
We want to maintain some statistics of the user session. The nature of the network is such that there may be short-lived extreme cases where there are lots of buffering events. Therefore, the average value of the number of buffering events is of limited use. The median number of buffering events, on the other hand, is significant.
Given that the data values are revealed in a sliding window fashion, calculate the median number of buffering events in a one-minute interval.
Let’s consider the following example with k = 4
:
Dry-run templates
The following is the implementation of the solution provided for the Find the Median of a Number Stream (medium) problem:
from heapq import *class MedianOfAStream:maxHeap = [] # containing first half of numbersminHeap = [] # containing second half of numbersdef insert_num(self, num):if not self.maxHeap or -self.maxHeap[0] >= num:heappush(self.maxHeap, -num)else:heappush(self.minHeap, num)# either both the heaps will have equal number of elements or max-heap will have one# more element than the min-heapif len(self.maxHeap) > len(self.minHeap) + 1:heappush(self.minHeap, -heappop(self.maxHeap))elif len(self.maxHeap) < len(self.minHeap):heappush(self.maxHeap, -heappop(self.minHeap))def find_median(self):if len(self.maxHeap) == len(self.minHeap):# we have even number of elements, take the average of middle two elementsreturn -self.maxHeap[0] / 2.0 + self.minHeap[0] / 2.0# because max-heap will have one more element than the min-heapreturn -self.maxHeap[0] / 1.0def main():medianOfAStream = MedianOfAStream()medianOfAStream.insert_num(3)medianOfAStream.insert_num(1)print("The median is: " + str(medianOfAStream.find_median()))medianOfAStream.insert_num(5)print("The median is: " + str(medianOfAStream.find_median()))medianOfAStream.insert_num(4)print("The median is: " + str(medianOfAStream.find_median()))main()
Sample Input #1
insert 3
insert 1
find median
insert 5
find median
insert 4
find median
Method | Line No. | median | maxHeap | maxHeap |
- | 6-7 | - | [] | [] |
insert_num(3) | 10-13 | - | [] | [-3] |
17-20 | - | [] | [-3] | |
insert_num(1) | 10-13 | - | [] | [-3, -1] |
17-20 | - | [3] | [-1] | |
find_median() | 25 | 2.0 | [3] | [-1] |
insert_num(5) | 10-13 | - | [3,5] | [-1] |
17-20 | - | [5] | [-3,-1] | |
find_median() | 28 | 3.0 | [5] | [-3,-1] |
insert_num(4) | 10-13 | - | [4,5] | [-3,-1] |
17-20 | - | [4,5] | [-3,-1] | |
find_median() | 25 | 3.5 | [4,5] | [-3,-1] |
Sample Input 2
Students can be asked to complete the table below for the following input:
insert 2
insert 3
insert 4
insert 5
find median
insert 5
find median
insert 5
find median
Method | Line No. | median | maxHeap | maxHeap |
- | 6-7 | - | [] | [] |
insert_num(2) | 10-13 | - | [] | [-2] |
17-20 | - | [] | [-2] | |
insert_num(3) | 10-13 | - | [3] | [-2] |
17-20 | - | [3] | [-2] | |
... | ... | ... | ... | ... |
Step-by-step solution construction
Let’s start implementing insert_num
function as the first step.
As discussed in the solution, we need two heaps maxHeap
and minHeap
. We will insert the negative of the original number in the maxHeap
because we will need the maximum number from the maxHeap
later in the solution.
In the code below, we push the num
on to the minHeap
if the largest number in the maxHeap
is greater than the num
. Otherwise, it will be inserted on to the minHeap
from heapq import *class MedianOfAStream:maxHeap = [] # containing first half of numbersminHeap = [] # containing second half of numbersdef insert_num(self, num):if not self.maxHeap or -self.maxHeap[0] >= num:heappush(self.maxHeap, -num)else:heappush(self.minHeap, num)print("\theap statuses after inserting " + str(num))self.print_status()def print_status(self):print("\tminHeap: " + str(self.minHeap),end=" ")print("\tmaxHeap: " + str(self.maxHeap))def main():medianOfAStream = MedianOfAStream()stream_of_numbers = [3, 1, 5, 4, 1, 1, 2, 2]for num in stream_of_numbers:print("insert", num)medianOfAStream.insert_num(num)print(("-"*100)+"\n")main()
As the output of the above code shows, although we inserted 8 numbers on to the heaps, however, we do not have an equal number of elements in both the heaps.
We need to adjust the elements such that maxHeap
can only have one more element than the minHeap
(in case of an odd number of inserted elements).
In the code below, we adjust the elements in the heap according to the following rules:
-
Pop the highest number from the
maxHeap
and push it on to theminHeap
if the length of themaxHeap
exceeds that of theminHeap
by 2 or more. -
Pop the lowest number from the
minHeap
and insert on to themaxHeap
if the length of themaxHeap
is less than that of theminHeap
from heapq import *class MedianOfAStream:maxHeap = [] # containing first half of numbersminHeap = [] # containing second half of numbersdef insert_num(self, num):if not self.maxHeap or -self.maxHeap[0] >= num:heappush(self.maxHeap, -num)else:heappush(self.minHeap, num)print("\theap status before adjusting the number of elements")self.print_status()# either both the heaps will have equal number of elements or max-heap will have one# more element than the min-heapif len(self.maxHeap) > len(self.minHeap) + 1:heappush(self.minHeap, -heappop(self.maxHeap))elif len(self.maxHeap) < len(self.minHeap):heappush(self.maxHeap, -heappop(self.minHeap))print("\theap status after adjusting the number of elements")self.print_status()def print_status(self):print("\tminHeap: " + str(self.minHeap),end=" ")print("\tmaxHeap: " + str(self.maxHeap))def main():medianOfAStream = MedianOfAStream()stream_of_numbers = [3, 1, 5, 4, 1, 1, 2, 2]for num in stream_of_numbers:print("insert", num)medianOfAStream.insert_num(num)print(("-"*100)+"\n")main()
insert_num
method ensures that the elements are adjusted in the heaps according to the rules defined in the previous step. Now, we can calculate the median of the elements as follows:
-
If the number of elements is even, both the heaps will have an equal number of elements. We can take the first(the maximum) number from
maxHeap
and the first(the minimum) number fromminHeap
and take the average to calculate the median. -
If the number of elements is odd, we can take the maximum number from
maxHeap
as the median, asmaxHeap
will have one more element thanminHeap
.
In the code below, the find_median
method computes the median according to the above rules.
from heapq import *class MedianOfAStream:maxHeap = [] # containing first half of numbersminHeap = [] # containing second half of numbersdef insert_num(self, num):if not self.maxHeap or -self.maxHeap[0] >= num:heappush(self.maxHeap, -num)else:heappush(self.minHeap, num)print("\theap status before adjusting the number of elements")self.print_status()# either both the heaps will have equal number of elements or max-heap will have one# more element than the min-heapif len(self.maxHeap) > len(self.minHeap) + 1:heappush(self.minHeap, -heappop(self.maxHeap))elif len(self.maxHeap) < len(self.minHeap):heappush(self.maxHeap, -heappop(self.minHeap))print("\theap status after adjusting the number of elements")self.print_status()def find_median(self):print("\tNumber of elements in minHeap: " + str(len(self.minHeap)))print("\tNumber of elements in maxHeap: " + str(len(self.maxHeap)))if len(self.maxHeap) == len(self.minHeap):# we have even number of elements, take the average of middle two elementsprint("\tNumber of elements is equal in both heaps. Taking average of middle two elements as median.")return -self.maxHeap[0] / 2.0 + self.minHeap[0] / 2.0# because max-heap will have one more element than the min-heapprint("\tmaxHeap has one more element than minHeap. Taking maximum from maxHeap as median.")return -self.maxHeap[0] / 1.0def print_status(self):print("\tminHeap: " + str(self.minHeap),end=" ")print("\tmaxHeap: " + str(self.maxHeap))def main():medianOfAStream = MedianOfAStream()stream_of_numbers = [3, 1, 5, 4, 1, 1, 2, 2]for num in stream_of_numbers:print("insert", num)medianOfAStream.insert_num(num)print("Finding the median...")print("The median of the numbers received so far is: " + str(medianOfAStream.find_median()))print(("-"*100)+"\n")main()