The “Find The Duplicate Number” problem is a classic challenge in computer science and algorithm design, often encountered in coding interviews and competitive programming. The optimal solution to this problem involves cycle detection using fast and slow pointers. However, this algorithm works only when the input array doesn’t contain
Given an unsorted array of positive numbers, nums
, such that the values lie in the range nums
. There is only one repeated number in nums
.
Constraints:
nums.length
nums[i]
All the integers in nums
are unique, except for one integer that will appear more than once.
Examples:
Let’s look into a couple of examples to get an idea about the sample input and output.
Input | Output |
[1, 8, 8, 4, 2, 5] | 8 |
[1, 5, 3, 4, 2, 5] | 5 |
[1, 2, 3, 4, 3, 6, 3] | 3 |
The naive solution to this problem is to iterate through the array and for each element, check the rest of the array if there’s a duplicate. If a duplicate is found, we return that duplicate number.
The time complexity of this algorithm is
The optimal solution employs Floyd’s Tortoise and Hare algorithm, utilizing two pointers, slow and fast, initially set to the first element of the array. The slow pointer advances by one step, while the fast pointer advances by two steps in each iteration. When they meet, it indicates the presence of a cycle. At this point, one pointer is reset to the beginning of the array while both move at the same speed. The meeting point of the two pointers is the start of the cycle, which corresponds to the duplicate number in the array. This approach ensures linear time complexity
In this answer, we’re interested in talking about the cycle detection algorithm and why it fails if the input array contains
To find the cycle, we iterate nums
using the
We start from the first index, i.e., nums
according to the above sequence, if we encounter an element twice, we come to know that this is the duplicate element. The following illustration demonstrates this workflow:
From the above algorithm, we know that any index that we visit in the array is due to the value present at the previously visited index, except the first index (
In case when
Let’s try to understand this concept with the help of the following illustration.
Therefore, we can’t include
Finally, let’s look at the implementation of both cases discussed above:
def find_duplicate(nums):# Initialize the fast and slow pointers at the first index of the arrayfast = slow = nums[0]# Traverse the array until the intersection point is foundwhile True:slow = nums[slow]# Move the fast pointer two stepsfast = nums[nums[fast]]# Break the loop when the slow and fast pointers meetif slow == fast:break# Set the slow pointer at the start of the arrayslow = nums[0]# Traverse in the array until the slow pointer meets the fast pointerwhile slow != fast:slow = nums[slow]# Move the fast pointer one stepfast = nums[fast]return fast# Driver codedef main():nums = [[2, 5, 8, 6, 3, 9, 4, 8, 1, 7],[2, 5, 8, 6, 3, 9, 7, 4, 1, 0]]for i in range(len(nums)):print(i + 1, ".\tnums = ", nums[i], sep="")print("\tDuplicate number = ", find_duplicate(nums[i]), sep="")print("-" * 100)if __name__ == '__main__':main()
Let’s try to understand the code above:
Line 3: We initialize fast
and slow
pointers at the first index of the array.
Lines 6–12: We traverse the array until the intersection point is found.
Lines 17–20: We traverse in the array until the slow pointer meets the fast pointer.
It’s evident from the output of the second test case mentioned on line 27 that the algorithm fails if the input array contains 0
. The output says that the duplicate number in the array is 2
, but the array doesn’t contain any duplicate numbers.