Introduction
Some learning resources for top K elements.
Any problem that asks us to find the top/smallest/most frequent k elements among a given set is likely to be a good match for this pattern. We often use the heap data structure for implementing solutions to these kinds of problems.
You can read more about heaps here.
Dry-run templates
Consider the following algorithm for finding the k
largest elements in an unsorted array:
from heapq import *def find_k_largest_numbers(nums, k):minHeap = []# put first 'K' numbers in the min heapfor i in range(k):heappush(minHeap, nums[i])# go through the remaining numbers of the array, if the number from the array is bigger than the# top(smallest) number of the min-heap, remove the top number from heap and add the number from arrayfor i in range(k, len(nums)):if nums[i] > minHeap[0]:heappop(minHeap)heappush(minHeap, nums[i])# the heap has the top 'K' numbersreturn minHeapdef main():print("Here are the top K numbers: " +str(find_k_largest_numbers([3, 1, 5, 12, 2, 11], 3)))print("Here are the top K numbers: " +str(find_k_largest_numbers([5, 12, 11, -1, 12], 3)))main()
The first step in our algorithm is to populate our minHeap
with the first k
characters of the input array (lines 7-8). This is not shown in the dry run table. The “iteration” column refers to the iteration number of the second for
loop.
Sample input #1
Input: [3, 1, 5, 12, 2, 11], K = 3
Iteration | Line number | minHeap[0] | nums[i] | minHeap |
- | 7-8 | - | - | [1, 3, 5] |
1 | 13 | 1 | 12 | [1, 3, 5] |
1 | 14 | - | 12 | [3, 5] |
1 | 15 | - | 12 | [3, 5, 12] |
2 | 13 | 3 | 2 | [3, 5, 12] |
3 | 13 | 3 | 11 | [3, 5, 12] |
3 | 14 | - | 11 | [5, 12] |
3 | 15 | - | 11 | [5, 12, 11] |
Sample input #2
[5, 12, 11, -1, 12], K = 3
Iteration | Line number | minHeap[0] | nums[i] | minHeap |
- | 7-8 | - | - | [5, 12, 11] |
1 | 13 | 5 | -1 | [5, 12, 11] |
2 | 13 | 5 | 12 | [5, 12, 11] |
2 | 14 | - | 12 | [11, 12] |
2 | 15 | - | 12 | [11, 12, 12] |
Step-by-step solution construction
The first step in our program is to populate our minHeap with the first k
characters of our input array. We’ll use a for
loop for iterating over the input elements and pushing them into our heap.
For example, if our nums
array is and our k
is , we’ll push only the first 3 elements in the heap. Our minHeap
will then be .
from heapq import *def find_k_largest_numbers(nums, k):minHeap = []# put first 'K' numbers in the min heapfor i in range(k):print("\tLoop index: ", i + 1)heappush(minHeap, nums[i])print("\tInitial state of min heap: ", minHeap, "\n")def main():input = [([3, 1, 5, 12, 2, 11], 3), ([5, 12, 11, -1, 12], 3)]for i in input:print("Input array: ", i[0])find_k_largest_numbers(i[0], i[1])print("-"*100 + "\n")main()
Please note that the smallest element in the min heap data structure will always be stored at the top, that is, the index.
After populating our heap, we’ll go through the remaining numbers of the array. If the current number is larger than the smallest element in the heap, we pop it from the heap and push the current number on to the heap. Since we’re interested in the k
largest elements, this approach removes the smallest element if a larger number is encountered.
Consider an example where our heap is and the next element in our input array is . The top element in our heap is . Since , we’ll pop from the heap and push . Our resulting heap will then be .
Let’s see this below:
from heapq import *def find_k_largest_numbers(nums, k):minHeap = []# put first 'K' numbers in the min heapfor i in range(k):heappush(minHeap, nums[i])print("Initial state of min heap: ", minHeap, "\n")# go through the remaining numbers of the array, if the number from the array is bigger than the# top(smallest) number of the min-heap, remove the top number from heap and add the number from arrayj = 1for i in range(k, len(nums)):print("\tLoop iteration: ", j)j+=1print("\tNext input element: ", nums[i], "\n")if nums[i] > minHeap[0]:print("\tSince our input element is larger than our smallest element, we'll pop from the heap.")heappop(minHeap)print("\tHeap after popping the smallest element: ", minHeap, "\n")print("\tNow, we push the input element on to our heap.")heappush(minHeap, nums[i])print("\tHeap after pushing the new element: ", minHeap, "\n")else:print("\tThe input element is smaller than the smallest heap element, so we'll do nothing.")print("\tOur heap will remain the same: ", minHeap, "\n")# the heap has the top 'K' numbersreturn minHeapdef main():input = [([3, 1, 5, 12, 2, 11], 3), ([5, 12, 11, -1, 12], 3)]for i in input:print("Input array: ", i[0])find_k_largest_numbers(i[0], i[1])print("-"*100 + "\n")main()
By following the above approach, after traversing through the entire input list, we’ll be left with the k
largest elements in our minHeap
.