Heaps are powerful because they allow us to maintain order (minimum or maximum) without needing full sorting, making operations much faster than other data structures like arrays or linked lists. They are also used when we need to repeatedly access the smallest or largest element in a dataset. For example, let’s look at the Last Stone Weight problem. In this problem, we are given an array of stone weights and tasked with repeatedly smashing the two heaviest stones together. If the two heaviest stones have the same weight, both are destroyed. If they have different weights, the smaller stone is destroyed, and the smaller stone’s weight reduces the larger stone’s weight. This process continues until there is at most one stone remaining.
To solve this problem, we need to repeatedly select the heaviest two stones, smash them together, and update the list of stones based on the result of each smash. As we want to efficiently retrieve the two largest stones at each step, a max heap is perfect for this problem. Using a max heap allows us to easily access and remove the heaviest stones while ensuring that the heap structure maintains its property of providing the largest element at the top.
To begin, we first build a max heap using the stone weights. Then, at each turn, we:
Pop the two largest stones from the heap.
If the two stones have the same weight, both are destroyed, and no stone is added back to the heap.
If the two stones have different weights, the stone with the smaller weight is destroyed, and the stone with the larger weight is reduced by the smaller stone’s weight and pushed back into the heap for further processing.
This process continues until either one stone remains or none remains. The weight of the last remaining stone (or 0 if none remains) is returned as the result.
The time complexity of this heap-based solution is O(nlogn), where n is the number of stones. This is because building the heap takes O(n) time, and each of the n smash operations involves heap insertion and deletion, each taking O(logn) time. In contrast, the naive solution, which repeatedly scans the list to find the two largest stones, has a time complexity of O(n²) due to the need for n scans of O(n) each.
In addition to using a single heap, there are several scenarios where two heaps can be employed in the heaps pattern to optimize the solution. One common use case is when we need to efficiently track a dataset’s smallest and largest elements, such as finding the median or balancing data streams. By maintaining one min heap for the smaller half of the data and one max heap for the larger half, we can quickly access the median or adjust the distribution of elements. Another use case is for problems involving intervals or ranges, where one heap can store one set of values (e.g., start times), and the other tracks the complementary set (e.g., end times) to efficiently identify valid intervals or ranges.
For example, let’s look at the Find Right Interval problem. In this problem, we are given an array of intervals and must find the right interval for each. A right interval starts after the current interval ends. If no such interval exists, return -1. The goal is to efficiently find the smallest start time that is greater than or equal to the current interval’s end-time
This problem can be solved using two min heaps: one for storing start times and the other for storing interval end times. The key idea is to process intervals based on their end times, ensuring we find the smallest start time from the start heap greater than or equal to its end time for each interval.
We begin by populating two heaps with the start and end times.
For each interval, we pop the smallest end time from the end heap and remove any start times smaller than the current end time from the start heap.
If a valid start time remains in the start heap, it represents the right interval for the current interval.
If no valid right interval exists, we return -1 for that interval.
The time complexity of this heap-based solution is O(nlogn), where n is the number of intervals. This is because we perform O(n) insertions and deletions for each of the two heaps (one for start times and the other for end times), each heap operation taking O(logn) time. In contrast, the naive solution, which involves scanning through all the intervals for each interval to find the right interval, has a time complexity of O(n²), as it requires n comparisons for each of the n intervals.
The following examples illustrate some problems that can be solved with this approach:
Sliding window median: Given an array of integers and a window size k, find the median of each sliding window of size k as it moves from left to right through the array.