Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

Runtime: 84 ms, faster than 49.48% of C++ online submissions for Find Duplicate File in System.

简单字符串判断。

class Solution {
public:
unordered_map<string, vector<string>> mp;
void process(const string& s){
vector<string> filecontent;
vector<string> emptycontent;
int idx = ;
for(int i=; i<s.size(); i++){
if(s[i] == ' '){
//cout << i << endl;
filecontent.push_back(s.substr(idx, i - idx));
idx = i + ;
}
}
filecontent.push_back(s.substr(idx));
//for(auto v : filecontent) cout << v << endl; for(int i=; i<filecontent.size(); i++){
for(int j=; j<filecontent[i].size(); j++){
if(filecontent[i][j] == '('){
if(filecontent[i][j+] == ')'){
emptycontent.push_back(filecontent[] +"/"+ filecontent[i].substr(,j));
}else {
auto tmp = filecontent[i].substr(j+,filecontent[i].size() - j - );
//cout << tmp << endl;
mp[filecontent[i].substr(j+,filecontent[i].size() - j - )].push_back(filecontent[] + "/"+filecontent[i].substr(,j));
}
}
}
}
}
vector<vector<string>> findDuplicate(vector<string>& paths) {
vector<vector<string>> ret;
for(int i=; i<paths.size(); i++){
process(paths[i]);
}
for(auto it = mp.begin(); it != mp.end(); it++){
if(it->second.size() >= ){
ret.push_back(it->second);
}
}
return ret;
}
};

LC 609. Find Duplicate File in System的更多相关文章

  1. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  3. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  4. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. LeetCode Find Duplicate File in System

    原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...

  7. [leetcode-609-Find Duplicate File in System]

    https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. 学习--Spring IOC源码精读

    Spring核心IOC的源码分析(转载) 原文地址:https://javadoop.com/post/spring-ioc#toc11 转载地址:https://blog.csdn.net/nuom ...

  2. Java 通过Math.random() 生成6位随机数

    public static void main(String[] args) { String sjs=""; for (int i = 0; i < 6; i++) { i ...

  3. 启动Activity的单独事件方法2

    1.Button中创建android:onClick="sendmessage" sendmessage方法名 //MAIN_acitivity创建这个同名独立方法 响应Butto ...

  4. python模块、面向对象编程

    目录: 模块补充 xml 面向对象 一.模块补充 shutil: 文件复制模块:进行文件copy.压缩: 使用方法: 将文件内容拷贝到另一个文件中,可以部分内容 shutil.copyfileobj( ...

  5. IIS无法启动解决方案

    1.第一步 2.第二步

  6. poj3417 Network/闇の連鎖[树上差分]

    首先隔断一条树边,不计附加边这个树肯定是断成两块了,然后就看附加边有没有连着的两个点在不同的块内. 方法1:BIT乱搞(个人思路) 假设考虑到$x$节点隔断和他父亲的边,要看$x$子树内有没有点连着附 ...

  7. 大数据技术之kettle安装使用

    kettle是一款开源的ETL工具,纯java编写,可以在Windows.Linux.Unix上运行,绿色无需安装,数据抽取高效稳定. kettle的两种设计 简述: Transformation(转 ...

  8. 导出excel 各 cvs 的方法

    public function orderExcelExport($data,$filename='simple.xls'){ ini_set('max_execution_time', '0'); ...

  9. VirtualBox:启动虚拟机后计算机死机

    造冰箱的大熊猫@cnblogs 2018/2/21 故障描述:Ubuntu 16.04升级Linux内核后,在VirtualBox中启动虚拟机发现Ubuntu死机,只能通过长按电源开关硬关机的方式关闭 ...

  10. Confluence 6 编辑一个附加文件的属性

    你需要具有空间的 添加附件(Add Attachment)权限来编辑文件的属性. 希望编辑一个附加文件的属性: Go to  > Attachments 单击你希望编辑附件边上的 属性(Prop ...