Search⌘ K
AI Features

Feature #2: Return Match

Explore how to identify the smallest subsequence from one code sample that appears in another to detect plagiarism. Understand the step-by-step process and algorithm to match tokens while avoiding dummy statements. This lesson helps you implement an efficient solution to locate copied snippets in coding assignments using constant space and moderate time complexity.

Description

Now, we need to identify the plagiarized code snippets from two sets of code sample tokens. We will use the same rules from the previous feature to match the tokens. For a cheating student, we need to locate all the instances of copied content, keeping in mind that some text may have been inserted to make a copied submission look different than the original. Like before, we have to avoid dummy statements or comments. We will do this by returning the copied tokens as a subsequence match for the second student’s code tokens. In the cheater string, there could be many subsequences of different sizes that can match with student. We will have to fetch the smallest of them.

We’ll be provided with two strings: cheater and student. We have to return the smallest subsequence of student that occurs in the cheater (if it exists).

Solution

Initially, we can search for the first occurrence of the subsequence. Then, we can find the subsequence in the opposite direction to find the smaller one. For example, if we have student = ab and ...