# DIY: Find Duplicate Files in System

Solve the interview question "Find Duplicate Files in a System" in this lesson.

We'll cover the following

## Problem statement

Suppose you are given a list, paths, of directory information, including the directory path and all the files with contents in this directory. Your task is to return a list where each element is the path of all the files in the file system that have the same content.

Note: You can return the final answer in any order. At least two files that have the same content are considered to be a group of duplicate files.

A single directory information string in paths has the following format:

• "root/dir1/dir2/.../dirm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

The above entry represents n files (f1.txt, f2.txt ... fn.txt) with content (f1_content, f2_content ... fn_content) respectively in the directory root/dir1/dir2/.../dirm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of groups of duplicate file paths. Each group contains all the file paths of the files that have the same content. A file path is a string that has the following format:

• directory_path/file_name.txt

### Constraints

• paths[i] consist of English letters, digits, '/', '.', '(', ')', and ' '.
• We can assume that no files or directories share the same name in the same directory.
• You can assume each given directory information represents a unique directory. A single blank space separates the directory path and file information.

### Input

The input will be an array of integers and two integers. The following are examples of input to the function:

// Sample Input 1
paths = ["root/a 4.txt(xyz) 1.txt(algorithms)","root/c 3.txt(educative)","root/c/d 2.txt(algorithms)","root 4.txt(educative) 5.txt(abcd)"]

// Sample Input 2
paths = ["root 1.txt(abcd) 2.txt(algo)","root/a 2.txt(abcd)","root/c/d 4.txt(algo)"]

// Sample Input 3
paths = ["root 1.txt(abcd) 2.txt(algo)","root/a 2.txt(xyzc)","root/c/d 4.txt(educative)"]


### Output

The output will be a list of integers containing k closest integers to x. The following are examples of the outputs:

// Sample Output 1
[["root/a/1.txt","root/c/d/2.txt"],["root/c/3.txt","root/4.txt"]]

// Sample Output 2
[["root/2.txt","root/c/d/4.txt"],["root/1.txt","root/a/2.txt"]]

// Sample Output 3
[]


Level up your interview prep. Join Educative to access 70+ hands-on prep courses.