Feature #1: Possible Matches

Explore techniques to detect possible matches in a plagiarism checker by analyzing tokenized documents. Understand how to track subsequences and manage dummy tokens to identify suspected copied content efficiently, while learning to implement this in Scala. This lesson enhances skills in problem-solving and string manipulation for coding interviews.

We'll cover the following...

Description
Solution
Complexity measures

Time complexity
Space complexity

Description

We are given a set of documents. Each document is submitted by a different individual. However, we suspect that some individuals may have copied from others. Given a plagiarised submitted document, we want to identify the number of documents with which there is a potential match. We have converted each document into a set of tokens based on their content. As mentioned previously, the students could have added dummy statements between the copied content to avoid identification. We’ll have to match the tokens of two students while taking into account that there can be dummy tokens that might not match. A potential match can occur if one token results in the subsequence of the other token. It is not a guarantee that every match is plagiarized content. In this scenario, we’ll discard the matched tokens that have a length less than two.

We’ll be provided with a string, plagiarised, and an array, students. The plagiarised will contain the tokens against which we’ll match the code samples present in the students array. We have to return the number of possible students in a class the plagiarised content may have been copied from.

1.✨Getting Started

2.Netflix

3.Facebook

4.Search Engine

5.Google Calendar

6.Stock Scraper

7.UBER

8.Amazon

9.Zoom

10.Plagiarism Checker

11.Network

12.Cyber Security

13.Operating System

14.Language Compiler

15.Boggle

16.Scrabble 2.0

17.Game

18.Stocks

19.Computational Biology

20.Cellular Operator(AT&T)

21.Twitter

22.Trees

23.Miscellaneous

24.Conclusion

Feature #1: Possible Matches

Description

Solution