...

/

Introduction

Introduction

Some learning resources for top K elements.

Any problem that asks us to find the top/smallest/most frequent k elements among a given set is likely to be a good match for this pattern. We often use the heap data structure for implementing solutions to these kinds of problems.

You can read more about heaps here.

Dry-run templates

Consider the following algorithm for finding the k largest elements in an unsorted array:

Press + to interact
Python 3.5
from heapq import *
def find_k_largest_numbers(nums, k):
minHeap = []
# put first 'K' numbers in the min heap
for i in range(k):
heappush(minHeap, nums[i])
# go through the remaining numbers of the array, if the number from the array is bigger than the
# top(smallest) number of the min-heap, remove the top number from heap and add the number from array
for i in range(k, len(nums)):
if nums[i] > minHeap[0]:
heappop(minHeap)
heappush(minHeap, nums[i])
# the heap has the top 'K' numbers
return minHeap
def main():
print("Here are the top K numbers: " +
str(find_k_largest_numbers([3, 1, 5, 12, 2, 11], 3)))
print("Here are the top K numbers: " +
str(find_k_largest_numbers([5, 12, 11, -1, 12], 3)))
main()

The first step in our algorithm is to populate our minHeap with the first k characters of the input array (lines 7-8). This is not shown in the dry run table. The “iteration” column refers to the iteration number of the second for loop.

Sample input #1

Input: [3, 1, 5, 12, 2, 11], K = 3

Iteration

Line number

minHeap[0]

nums[i]

minHeap

-

7-8

-

-

[1, 3, 5]

1

13

1

12

[1, 3, 5]

1

14

-

12

[3, 5]

1

15

-

12

[3, 5, 12]

2

13

3

2

[3, 5, 12]

3

13

3

11

[3, 5, 12]

3

14

-

11

[5, 12]

3

15

-

11

[5, 12, 11]

Sample input #2

[5, 12, 11, -1, 12], K = 3

Iteration

Line number

minHeap[0]

nums[i]

minHeap

-

7-8

-

-

[5, 12, 11]

1

13

5

-1

[5, 12, 11]

2

13

5

12

[5, 12, 11]

2

14

-

12

[11, 12]

2

15

-

12

[11, 12, 12]

Step-by-step solution construction

The first step in our program is to populate our minHeap with the first k characters of our input array. We’ll use a for loop for iterating over the input elements and pushing them into our heap.

For example, if our nums array is [1,2,3,5,8][1,2,3,5,8] and our k is 33, we’ll push only the first 3 elements in the heap. Our minHeap will then be [1,2,3][1,2,3].

Press + to interact
Python 3.5
from heapq import *
def find_k_largest_numbers(nums, k):
minHeap = []
# put first 'K' numbers in the min heap
for i in range(k):
print("\tLoop index: ", i + 1)
heappush(minHeap, nums[i])
print("\tInitial state of min heap: ", minHeap, "\n")
def main():
input = [([3, 1, 5, 12, 2, 11], 3), ([5, 12, 11, -1, 12], 3)]
for i in input:
print("Input array: ", i[0])
find_k_largest_numbers(i[0], i[1])
print("-"*100 + "\n")
main()

Please note that the smallest element in the min heap data structure will always be stored at the top, that is, the 0th0^{th} index.

After populating our heap, we’ll go through the remaining numbers of the array. If the current number is larger than the smallest element in the heap, we pop it from the heap and push the current number on to the heap. Since we’re interested in the k largest elements, this approach removes the smallest element if a larger number is encountered.

Consider an example where our heap is [1,2,3][1,2,3] and the next element in our input array is 44. The top element in our heap is 11. Since 4>14 > 1, we’ll pop 11 from the heap and push 44. Our resulting heap will then be [2,3,4][2,3,4].

Let’s see this below:

Press + to interact
Python 3.5
from heapq import *
def find_k_largest_numbers(nums, k):
minHeap = []
# put first 'K' numbers in the min heap
for i in range(k):
heappush(minHeap, nums[i])
print("Initial state of min heap: ", minHeap, "\n")
# go through the remaining numbers of the array, if the number from the array is bigger than the
# top(smallest) number of the min-heap, remove the top number from heap and add the number from array
j = 1
for i in range(k, len(nums)):
print("\tLoop iteration: ", j)
j+=1
print("\tNext input element: ", nums[i], "\n")
if nums[i] > minHeap[0]:
print("\tSince our input element is larger than our smallest element, we'll pop from the heap.")
heappop(minHeap)
print("\tHeap after popping the smallest element: ", minHeap, "\n")
print("\tNow, we push the input element on to our heap.")
heappush(minHeap, nums[i])
print("\tHeap after pushing the new element: ", minHeap, "\n")
else:
print("\tThe input element is smaller than the smallest heap element, so we'll do nothing.")
print("\tOur heap will remain the same: ", minHeap, "\n")
# the heap has the top 'K' numbers
return minHeap
def main():
input = [([3, 1, 5, 12, 2, 11], 3), ([5, 12, 11, -1, 12], 3)]
for i in input:
print("Input array: ", i[0])
find_k_largest_numbers(i[0], i[1])
print("-"*100 + "\n")
main()

By following the above approach, after traversing through the entire input list, we’ll be left with the k largest elements in our minHeap.