Problem
Ask
Submissions

Problem: Find Duplicate File in System

Medium
30 min
Explore how to detect duplicate files based on their contents by using hash maps efficiently. This lesson helps you understand parsing directory info and grouping files with identical content while practicing coding patterns essential for technical interviews. You'll strengthen your ability to tackle similar hash map problems confidently.

Statement

Given a list of directory information named paths, where each directory contains file paths and their respective contents, identify all the duplicate files in the system based on their contents. Each duplicate group should contain at least two files with identical content, and the output should list these groups with the full paths of the duplicate files. Each entry in the input list is formatted as :

"root_directory/dir_1/dir_2/…/dir_m file_1.txt(file_1_content) file_2.txt(file_2_content) … file_n.txt(file_n_content)"

The path above indicates that there are n files (file_1.txt, file_2.txt, …, file_n.txt) with respective contents (file_1_content, file_2_content, …, file_n_content) in the directory "root_directory/dir_1/dir_2/…/dir_m". The output should be a list of groups containing the paths of files sharing the same content. Each file path should follow the format given below:

"directory_path/file_name.txt".

The order of the groups or the paths within them does not matter.

Constraints:

  • 11 \leq paths.length \leq 2×1042 \times 10^4

  • 11 \leq paths[i].length \leq 30003000

  • 11 \leq sum(paths[i].length) \leq 5×1055 \times 10^5

  • paths[i] consist of English letters, digits, '/''.''('')', and ' '.

  • You may assume that no files or directories share the same name in the same directory.

  • You may assume that each given directory info represents a unique directory. A single blank space separates the directory path and file info.

Problem
Ask
Submissions

Problem: Find Duplicate File in System

Medium
30 min
Explore how to detect duplicate files based on their contents by using hash maps efficiently. This lesson helps you understand parsing directory info and grouping files with identical content while practicing coding patterns essential for technical interviews. You'll strengthen your ability to tackle similar hash map problems confidently.

Statement

Given a list of directory information named paths, where each directory contains file paths and their respective contents, identify all the duplicate files in the system based on their contents. Each duplicate group should contain at least two files with identical content, and the output should list these groups with the full paths of the duplicate files. Each entry in the input list is formatted as :

"root_directory/dir_1/dir_2/…/dir_m file_1.txt(file_1_content) file_2.txt(file_2_content) … file_n.txt(file_n_content)"

The path above indicates that there are n files (file_1.txt, file_2.txt, …, file_n.txt) with respective contents (file_1_content, file_2_content, …, file_n_content) in the directory "root_directory/dir_1/dir_2/…/dir_m". The output should be a list of groups containing the paths of files sharing the same content. Each file path should follow the format given below:

"directory_path/file_name.txt".

The order of the groups or the paths within them does not matter.

Constraints:

  • 11 \leq paths.length \leq 2×1042 \times 10^4

  • 11 \leq paths[i].length \leq 30003000

  • 11 \leq sum(paths[i].length) \leq 5×1055 \times 10^5

  • paths[i] consist of English letters, digits, '/''.''('')', and ' '.

  • You may assume that no files or directories share the same name in the same directory.

  • You may assume that each given directory info represents a unique directory. A single blank space separates the directory path and file info.