Median of two sorted array in Python O(log n)

The median of two sorted arrays is a fundamental problem that requires finding the middle value of the combined elements in the arrays. In this Answer, we will discuss an efficient algorithm to solve this problem with the time complexity of $O(log\ n)$ . We will use a binary search-based approach to divide the search space at each step, making it an optimal solution.

Key takeaways:
The median of two sorted arrays can be found in $O(log⁡~n)$ time using binary search, rather than the typical $O(n)$ time as in case of merging the arrays approach.
If the array’s combined length is odd, the median is the middle element, whereas in the case of even length, the median is the average of the two middle elements.
The solution involves finding partition points in both arrays where elements on the left are less than or equal to those on the right, ensuring that median calculations are accurate.

Understanding the problem

Given two sorted arrays, array1 and array2, with lengths m and n respectively, our task is to find the median of the combined array that contains all elements from both array1 and array2. The median is defined as follows:

If the total number of elements (m + n) is odd, the median is the middle element.

Binary search approach

To achieve $O(log\ n)$ time complexity, we will use a binary search-based algorithm. The idea is to find a partition point in both arrays where the elements on the left side of the partition are smaller than the elements on the right side. The median of the combined array can be derived from these partition points.

To learn more about the binary search pattern, consider checking out our extensive course Grokking Coding Interview Patterns. It covers more than 20 algorithmic patterns and has 230+ problems.

Finding the partition points

For an array with n elements, the partition point can be represented by an index i, which is the midpoint of the smaller array initially. To find the partition points, we will apply binary search on the smaller of the two arrays, array1. Once we have the partition point i for array1 using the formula: $\frac{left+right}{2}$ we can calculate the corresponding partition point j for array2 using the formula: $j = \frac{(m + n + 1)}{2} - i$ .

In the context of the binary search algorithm, left and right represent the search bounds within the smaller array. left starts at $0$ , and right starts at the length of the smaller array. The binary search adjusts these bounds to find the correct partition of the arrays such that the left contains half of the elements from both arrays, and the right contains the other half. As the search narrows, i (the partition index for the smaller array) is updated, and the left and right help converge on the correct partition.

Handling edge cases

We need to handle some edge cases, such as when i is 0 or m, or when j is 0 or n. These cases indicate that there are no elements on one side of the partition, and we handle them accordingly.

Calculating the median

Finally, we calculate the median using the partition points i and j.

If the total number of elements $(m + n)$ is odd, the median is the maximum elements in the left partition $max(array1[i-1], array2[j-1])$ .
If the total number of elements is even, the median is the average of the maximum of the left partitions and the minimum of the right partitions $(max(array1[i-1], array2[j-1]) + min(array1[i], array2[j])) / 2$ .

Example

Educative offers an AI-Powered Mock Interviewer designed to cater to users of all levels, from beginners to intermediate and expert audiences.

Code explanation

Line 3–4: To make binary search faster, we swap the arrays if array1 is larger, ensuring that array1 is always the smaller array. This minimizes the range of binary search.
Line 6–8: Initialize the following variables:
- m and n store the lengths of array1 and array2, respectively.
- half_len calculates half the combined length of both arrays. This helps to split the arrays to find the median.
- left and right define the search range for binary search over array1.
Line 10–12: The loop performs binary search on array1.
- i is the midpoint of the current binary search range in array1.
- j is the corresponding position in array2 such that the left half and right half together make up half the total length (half_len).
Line 14–15: If i is within array1 bounds and the element at array2[j - 1] is greater than array1[i], i is too small. We increment left to search the right half.
Line 16–17: If i is greater than zero and array1[i - 1] is greater than array2[j], i is too large. We decrement right to search the left half.
Line 18–24: If neither of the above conditions is met, i is positioned correctly.
- max_of_left will contain the maximum value of the left half.
- If i == 0, it means there are no elements left in array1 on the left, so we take array2[j - 1].
- If j == 0, it means there are no elements left in array2 on the left, so we take array1[i - 1].
- Otherwise, max_of_left is the maximum of array1[i - 1] and array2[j - 1].
Line 26–27: If the total length (m + n) is odd, the median is max_of_left, as there’s a single middle element.
Line 29–34: If (m + n) is even, the median is the average of the two middle values.
- min_of_right finds the minimum of the right half:
- If i == m, it means all elements in array1 are on the left, so we take array2[j].
- If j == n, it means all elements in array2 are on the left, so we take array1[i].
- Otherwise, min_of_right is the minimum of array1[i] and array2[j].
Line 37: Finally, for even-length arrays, we return the average of max_of_left and min_of_right as the median.

In case of arrays with different length, the time complexity of this code is O(log⁡ min⁡(m,n)), where m and n are the lengths of array1 and array2, respectively. This complexity arises due to the binary search only operating on the smaller of the two arrays, aiming to minimize search operations.

Space complexity

The space complexity of the solution is $O(1)$ because no extra space is used.

Conclusion

Finding the median of two sorted arrays is an important problem that can be solved efficiently with the time complexity of $O(log\ n)$ using the binary search approach. The Python code provided above demonstrates how to implement the algorithm step by step. By using this approach, we can efficiently find the median of large arrays.

Frequently asked questions

Haven’t found what you were looking for? Contact Us

Can this algorithm handle cases where one array is empty?

Yes, if one array is empty, the median is simply the median of the non-empty array, which the algorithm handles as a special case.

How does the algorithm determine if the median is a single middle value or an average?

The algorithm checks the total number of combined elements. If odd, the median is the middle value. If even, it’s the average of the two middle values from the combined partitions.