...

/

Introduction

Introduction

Real-world problems

Following are the real-world problems that can be solved via Two Heaps pattern:

Find Median Age

Netflix wants to add a feature to fetch relevant content based on the age of the users for a specific country or region. For this, we use the median age of users and the preferred content for users that fall into that specified category.

Because the number of users is constantly increasing on the Netflix platform, we need to figure out a structure or design that updates the median age of users in real-time. We will have an array that constantly receives age values, and we will output the median value after each new data point.

Let’s understand this better with an illustration:

Calculate Median of Buffering Events

Thousands of clients are expected to be associated with a Netflix server at any given time. User session statistics such as packet drops and buffering events in the last one-minute interval are relayed back to the Netflix servers to monitor the session quality and adjust the streaming rate. Due to a large number of users, this amounts to an immense volume of data. Limited memory per user session is available due to the problem scale.

At any time, we may store the last k values for a particular measure (such as the number of buffering events) of a specific user session. As the next value arrives, the oldest value is removed from the memory. In other words, all the values for a session are not available but are accessible in a sliding window fashion.

We want to maintain some statistics of the user session. The nature of the network is such that there may be short-lived extreme cases where there are lots of buffering events. Therefore, the average value of the number of buffering events is of limited use. The median number of buffering events, on the other hand, is significant.

Given that the data values are revealed in a sliding window fashion, calculate the median number of buffering events in a one-minute interval.

Let’s consider the following example with k = 4:

Dry-run templates

The following is the implementation of the solution provided for the Find the Median of a Number Stream (medium) problem:

Press + to interact
Python 3.5
from heapq import *
class MedianOfAStream:
maxHeap = [] # containing first half of numbers
minHeap = [] # containing second half of numbers
def insert_num(self, num):
if not self.maxHeap or -self.maxHeap[0] >= num:
heappush(self.maxHeap, -num)
else:
heappush(self.minHeap, num)
# either both the heaps will have equal number of elements or max-heap will have one
# more element than the min-heap
if len(self.maxHeap) > len(self.minHeap) + 1:
heappush(self.minHeap, -heappop(self.maxHeap))
elif len(self.maxHeap) < len(self.minHeap):
heappush(self.maxHeap, -heappop(self.minHeap))
def find_median(self):
if len(self.maxHeap) == len(self.minHeap):
# we have even number of elements, take the average of middle two elements
return -self.maxHeap[0] / 2.0 + self.minHeap[0] / 2.0
# because max-heap will have one more element than the min-heap
return -self.maxHeap[0] / 1.0
def main():
medianOfAStream = MedianOfAStream()
medianOfAStream.insert_num(3)
medianOfAStream.insert_num(1)
print("The median is: " + str(medianOfAStream.find_median()))
medianOfAStream.insert_num(5)
print("The median is: " + str(medianOfAStream.find_median()))
medianOfAStream.insert_num(4)
print("The median is: " + str(medianOfAStream.find_median()))
main()

Sample Input #1

insert 3
insert 1
find median
insert 5
find median 
insert 4
find median

Method

Line No.

median

maxHeap

maxHeap

-

6-7

-

[]

[]

insert_num(3)

10-13

-

[]

[-3]


17-20

-

[]

[-3]

insert_num(1)

10-13

-

[]

[-3, -1]


17-20

-

[3]

[-1]

find_median()

25

2.0

[3]

[-1]

insert_num(5)

10-13

-

[3,5]

[-1]


17-20

-

[5]

[-3,-1]

find_median()

28

3.0

[5]

[-3,-1]

insert_num(4)

10-13

-

[4,5]

[-3,-1]


17-20

-

[4,5]

[-3,-1]

find_median()

25

3.5

[4,5]

[-3,-1]

Sample Input 2

Students can be asked to complete the table below for the following input:

insert 2
insert 3
insert 4
insert 5
find median
insert 5
find median 
insert 5
find median

Method

Line No.

median

maxHeap

maxHeap

-

6-7

-

[]

[]

insert_num(2)

10-13

-

[]

[-2]


17-20

-

[]

[-2]

insert_num(3)

10-13

-

[3]

[-2]


17-20

-

[3]

[-2]

...

...

...

...

...

Step-by-step solution construction

Let’s start implementing insert_num function as the first step.

As discussed in the solution, we need two heaps maxHeap and minHeap. We will insert the negative of the original number in the maxHeap because we will need the maximum number from the maxHeap later in the solution.

In the code below, we push the num on to the minHeap if the largest number in the maxHeap is greater than the num. Otherwise, it will be inserted on to the minHeap

Press + to interact
Python 3.5
from heapq import *
class MedianOfAStream:
maxHeap = [] # containing first half of numbers
minHeap = [] # containing second half of numbers
def insert_num(self, num):
if not self.maxHeap or -self.maxHeap[0] >= num:
heappush(self.maxHeap, -num)
else:
heappush(self.minHeap, num)
print("\theap statuses after inserting " + str(num))
self.print_status()
def print_status(self):
print("\tminHeap: " + str(self.minHeap),end=" ")
print("\tmaxHeap: " + str(self.maxHeap))
def main():
medianOfAStream = MedianOfAStream()
stream_of_numbers = [3, 1, 5, 4, 1, 1, 2, 2]
for num in stream_of_numbers:
print("insert", num)
medianOfAStream.insert_num(num)
print(("-"*100)+"\n")
main()

As the output of the above code shows, although we inserted 8 numbers on to the heaps, however, we do not have an equal number of elements in both the heaps.

We need to adjust the elements such that maxHeap can only have one more element than the minHeap(in case of an odd number of inserted elements).

In the code below, we adjust the elements in the heap according to the following rules:

  1. Pop the highest number from the maxHeap and push it on to the minHeap if the length of the maxHeap exceeds that of the minHeap by 2 or more.

  2. Pop the lowest number from the minHeap and insert on to the maxHeap if the length of the maxHeap is less than that of the minHeap

Press + to interact
Python 3.5
from heapq import *
class MedianOfAStream:
maxHeap = [] # containing first half of numbers
minHeap = [] # containing second half of numbers
def insert_num(self, num):
if not self.maxHeap or -self.maxHeap[0] >= num:
heappush(self.maxHeap, -num)
else:
heappush(self.minHeap, num)
print("\theap status before adjusting the number of elements")
self.print_status()
# either both the heaps will have equal number of elements or max-heap will have one
# more element than the min-heap
if len(self.maxHeap) > len(self.minHeap) + 1:
heappush(self.minHeap, -heappop(self.maxHeap))
elif len(self.maxHeap) < len(self.minHeap):
heappush(self.maxHeap, -heappop(self.minHeap))
print("\theap status after adjusting the number of elements")
self.print_status()
def print_status(self):
print("\tminHeap: " + str(self.minHeap),end=" ")
print("\tmaxHeap: " + str(self.maxHeap))
def main():
medianOfAStream = MedianOfAStream()
stream_of_numbers = [3, 1, 5, 4, 1, 1, 2, 2]
for num in stream_of_numbers:
print("insert", num)
medianOfAStream.insert_num(num)
print(("-"*100)+"\n")
main()

insert_num method ensures that the elements are adjusted in the heaps according to the rules defined in the previous step. Now, we can calculate the median of the elements as follows:

  1. If the number of elements is even, both the heaps will have an equal number of elements. We can take the first(the maximum) number from maxHeap and the first(the minimum) number from minHeap and take the average to calculate the median.

  2. If the number of elements is odd, we can take the maximum number from maxHeap as the median, as maxHeap will have one more element than minHeap.

In the code below, the find_median method computes the median according to the above rules.

Press + to interact
Python 3.5
from heapq import *
class MedianOfAStream:
maxHeap = [] # containing first half of numbers
minHeap = [] # containing second half of numbers
def insert_num(self, num):
if not self.maxHeap or -self.maxHeap[0] >= num:
heappush(self.maxHeap, -num)
else:
heappush(self.minHeap, num)
print("\theap status before adjusting the number of elements")
self.print_status()
# either both the heaps will have equal number of elements or max-heap will have one
# more element than the min-heap
if len(self.maxHeap) > len(self.minHeap) + 1:
heappush(self.minHeap, -heappop(self.maxHeap))
elif len(self.maxHeap) < len(self.minHeap):
heappush(self.maxHeap, -heappop(self.minHeap))
print("\theap status after adjusting the number of elements")
self.print_status()
def find_median(self):
print("\tNumber of elements in minHeap: " + str(len(self.minHeap)))
print("\tNumber of elements in maxHeap: " + str(len(self.maxHeap)))
if len(self.maxHeap) == len(self.minHeap):
# we have even number of elements, take the average of middle two elements
print("\tNumber of elements is equal in both heaps. Taking average of middle two elements as median.")
return -self.maxHeap[0] / 2.0 + self.minHeap[0] / 2.0
# because max-heap will have one more element than the min-heap
print("\tmaxHeap has one more element than minHeap. Taking maximum from maxHeap as median.")
return -self.maxHeap[0] / 1.0
def print_status(self):
print("\tminHeap: " + str(self.minHeap),end=" ")
print("\tmaxHeap: " + str(self.maxHeap))
def main():
medianOfAStream = MedianOfAStream()
stream_of_numbers = [3, 1, 5, 4, 1, 1, 2, 2]
for num in stream_of_numbers:
print("insert", num)
medianOfAStream.insert_num(num)
print("Finding the median...")
print("The median of the numbers received so far is: " + str(medianOfAStream.find_median()))
print(("-"*100)+"\n")
main()