Median of two sorted array in Python O(log n)

The median of two sorted arrays is a fundamental problem that requires finding the middle value of the combined elements in the arrays. In this Answer, we will discuss an efficient algorithm to solve this problem with the time complexity of O(logn)O(logn). We will use a binary search-based approach to divide the search space at each step, making the solution faster and more optimal.

Stepwise algorithm

Let's move further stepwise.

Understanding the problem

Given two sorted arrays, arr1 and arr2, with lengths m and n, our task is to find the median of the combined array that contains all elements from both arr1 and arr2. The median is defined as follows:

  • If the total number of elements (m + n) is odd, the median is the middle element.

Median if the total number of elements (m + n) is odd
Median if the total number of elements (m + n) is odd
  • If the total number of elements is even, the median is the average of the two middle elements.

Median if the total number of elements (m + n) is even
Median if the total number of elements (m + n) is even

Binary search approach

To achieve O(logn)O(log n) time complexity, we will use a binary search-based algorithm. The idea is to find a partition point in both arrays where the elements on the left side of the partition are smaller than the elements on the right side. The median of the combined array can be derived from these partition points.

Finding the partition points

For an array with k elements, the partition point can be represented by an index i, where the left partition has i elements. To find the partition points, we will apply binary search on the smaller of the two arrays, arr1. Once we have the partition point i for arr1, we can calculate the corresponding partition point j for arr2 using the formula: j = (m + n + 1) // 2 - i.

Handling edge cases

We need to handle some edge cases, such as when i is 0 or m, or when j is 0 or n. These cases indicate that there are no elements on one side of the partition, and we handle them accordingly.

Calculating the median

Finally, we calculate the median using the partition points i and j. If the total number of elements (m + n) is odd, the median is the maximum of the maximum elements in the left partition (max(arr1[i-1], arr2[j-1])). If the total number of elements is even, the median is the average of the maximum of the left partitions and the minimum of the right partitions ((max(arr1[i-1], arr2[j-1]) + min(arr1[i], arr2[j])) / 2).

Visual explanation

Let's understand it visually.

1 of 9

Implementation

Let’s implement the solution in Python:

def findMedianSortedArrays(nums1, nums2):
if not nums1 and not nums2:
raise ValueError("Both input arrays are empty.")
if not nums1:
return find_median(nums2)
if not nums2:
return find_median(nums1)
if not all(isinstance(num, int) for num in nums1 + nums2):
raise TypeError("Input arrays should contain only integers.")
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
m, n = len(nums1), len(nums2)
imin, imax, half_len = 0, m, (m + n + 1) // 2
while imin <= imax:
i = (imin + imax) // 2
j = half_len - i
if i < m and nums2[j - 1] > nums1[i]:
imin = i + 1
elif i > 0 and nums1[i - 1] > nums2[j]:
imax = i - 1
else:
if i == 0:
max_of_left = nums2[j - 1]
elif j == 0:
max_of_left = nums1[i - 1]
else:
max_of_left = max(nums1[i - 1], nums2[j - 1])
if (m + n) % 2 == 1:
return max_of_left
if i == m:
min_of_right = nums2[j]
elif j == n:
min_of_right = nums1[i]
else:
min_of_right = min(nums1[i], nums2[j])
return (max_of_left + min_of_right) / 2.0
def find_median(nums):
n = len(nums)
if n % 2 == 1:
return nums[n // 2]
else:
return (nums[n // 2 - 1] + nums[n // 2]) / 2
# Function to merge two sorted arrays
def mergeSortedArrays(nums1, nums2):
merged = []
i, j = 0, 0
while i < len(nums1) and j < len(nums2):
if nums1[i] < nums2[j]:
merged.append(nums1[i])
i += 1
else:
merged.append(nums2[j])
j += 1
while i < len(nums1):
merged.append(nums1[i])
i += 1
while j < len(nums2):
merged.append(nums2[j])
j += 1
return merged
# Test the function
nums1 = [1,4,8,9]
nums2 = [2,4,7]
print("Sorted Array 1:", sorted(nums1))
print("Sorted Array 2:", sorted(nums2))
print("Merged Sorted Array:", mergeSortedArrays(nums1, nums2))
try:
print("Median:", findMedianSortedArrays(nums1, nums2)) # Output: 2.0
except ValueError as e:
print('Error:', str(e))

Code explanation

  • Line 1: The code defines a function the findMedianSortedArrays that takes two sorted arrays nums1 and nums2 as input and returns their median.

  • Lines 2–3: This line checks if both nums1 and nums2 are empty lists. The not keyword is used to check if the lists are empty. If both lists are empty, it means there are no elements to find the median, and this is considered an invalid input. In such a case, a ValueError is raised with the message "Both input arrays are empty."

  • Lines 12–14: If the length of nums1 is greater than the length of nums2, the code swaps the two arrays. This step ensures that nums1 is the smaller array and nums2 is the larger array. The variable m holds the length of the smaller array (nums1), and n holds the length of the larger array (nums2).

  • Line 15: Three variables are initialized: imin to 0, imax to the length of nums1 (m), and half_len to half of the total length of both arrays ((m + n + 1) // 2). The half_len represents the position of the median in the combined array.

  • Line 17: The code enters a while loop, which continues until the imin is less than or equal to the imax. This binary search-like loop is used to find the correct partition point i in the smaller array (nums1) such that elements on the left side of the partition are smaller than elements on the right side.

  • Line 18: The variable i is set to the middle index of the current search space.

  • Line 19: The variable j is calculated to represent the partition point in the larger array (nums2) based on the half_len and the current i. This is done so that both arrays have equal elements on both sides of the partition.

  • Lines 21–23: The code checks if i is too small (i.e., there are more elements to the right of i in nums1) or if the element to the left of the partition in nums2 is greater than the element at i in nums1. In such cases, i is updated to a higher value (i + 1) to increase the partition.

  • Lines 24–26: Similarly, if i is too big (i.e., there are more elements to the left of i in nums1) or if the element to the right of the partition in nums2 is smaller than the element at i in nums1, i is updated to a lower value (i - 1) to decrease the partition.

  • Line 27: If the else block is reached, it means that i is at the correct partition point.

  • Lines 29–34: The code calculates the max_of_left value, which is the maximum element on the left side of the partition in the combined array. It handles edge cases where the partition i or j is at the boundary of their respective arrays.

  • Lines 36–37: The code checks if the total number of elements in both arrays is odd. If it is, then the median is the max_of_left, and the function returns it.

  • Lines 39–44: If the total number of elements is even, the code calculates the min_of_right value, which is the minimum element on the right side of the partition in the combined array. It again handles the edge cases.

  • Line 46: The function returns the median of the combined array, which is calculated as the average of the max_of_left and min_of_right.

  • Line 51: This line calculates the length of the input list nums and stores it in the variable n.

  • Line 52: This line checks if the length of the list nums is odd. The % operator calculates the remainder when n is divided by 2. If the remainder is 1, it means the length is odd, and we execute the code block that follows.

  • Line 53: If the list length is odd, there is a single middle element. We use integer division (//) to find the index of the middle element and return it as the median.

  • Lines 54–55: If the length of the list is even, there are two middle elements. To find the median, we add these two middle elements and divide the sum by 2 to get the average. We return this average as the median.

  • Line 61: Initialize two pointers i and j to 0 to track the positions in nums1 and nums2 respectively.

  • Line 62: Iterate through both arrays until either pointer reaches the end of its respective array.

  • Lines 63–65: If the element in nums1 is smaller, add it to the merged list and move the pointer i to the next element in nums1.

  • Lines 66–68: If the element in nums2 is smaller or equal, add it to the merged list and move the pointer j to the next element in nums2.

  • Lines 69–74: After the above loop, one of the arrays might have some elements left. We add them all to the merged list in these lines.

  • Line 75: Return the merged sorted array.

Space complexity

The space complexity of the entire code is dominated by the mergeSortedArrays function, which is O(m+n)O(m + n), where m is the length of the nums1 and n is the length of the nums2.

Conclusion

Finding the median of two sorted arrays is an important problem that can be solved efficiently with the time complexity of O(logn)O(log n) using the binary search approach. The Python code provided above demonstrates how to implement the algorithm step by step. By using this approach, we can efficiently find the median of large arrays.

Copyright ©2024 Educative, Inc. All rights reserved