Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

Runtime: 84 ms, faster than 49.48% of C++ online submissions for Find Duplicate File in System.

简单字符串判断。

class Solution {
public:
unordered_map<string, vector<string>> mp;
void process(const string& s){
vector<string> filecontent;
vector<string> emptycontent;
int idx = ;
for(int i=; i<s.size(); i++){
if(s[i] == ' '){
//cout << i << endl;
filecontent.push_back(s.substr(idx, i - idx));
idx = i + ;
}
}
filecontent.push_back(s.substr(idx));
//for(auto v : filecontent) cout << v << endl; for(int i=; i<filecontent.size(); i++){
for(int j=; j<filecontent[i].size(); j++){
if(filecontent[i][j] == '('){
if(filecontent[i][j+] == ')'){
emptycontent.push_back(filecontent[] +"/"+ filecontent[i].substr(,j));
}else {
auto tmp = filecontent[i].substr(j+,filecontent[i].size() - j - );
//cout << tmp << endl;
mp[filecontent[i].substr(j+,filecontent[i].size() - j - )].push_back(filecontent[] + "/"+filecontent[i].substr(,j));
}
}
}
}
}
vector<vector<string>> findDuplicate(vector<string>& paths) {
vector<vector<string>> ret;
for(int i=; i<paths.size(); i++){
process(paths[i]);
}
for(auto it = mp.begin(); it != mp.end(); it++){
if(it->second.size() >= ){
ret.push_back(it->second);
}
}
return ret;
}
};

LC 609. Find Duplicate File in System的更多相关文章

  1. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  3. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  4. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. LeetCode Find Duplicate File in System

    原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...

  7. [leetcode-609-Find Duplicate File in System]

    https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. PhotoShop更改图片背景色

    PhotoShop更改图片背景色 操作步骤如下所示: 打开图片==>图像/调整/替换颜色==>选择颜色==>选择油漆桶工具==>点击需要被替换的图片背景色 注:不知道什么原因 ...

  2. Jmeter (三) 集合点 、检查点 (断言)

    不同的 测试工具有不同的命名 断言 :即检查点,在请求中 加入测试人员的判断,返回 结果 TRUE  or FALSE ,得到 测试人员的判断是否正确 集合点 1.打开  synchronizing ...

  3. P2664 树上颜色统计 点分治 虚树 树上差分 树上莫队

    树上差分O(n)的做法 考虑每种颜色对每个点的贡献,如果对于每种颜色我们把当前颜色的点删除,那么原来的树就会分成几个子树,对于一个点,当前颜色在和他同子树的点的点对路径上是不会出现的.考虑到有多种颜色 ...

  4. 一篇文章让您了解MQTT

    转载:https://www.jianshu.com/p/de88edf8e023 什么是MQTT ​ MQTT是基于二进制消息的发布/订阅编程模式的消息协议,最早由IBM提出的,如今已经成为OASI ...

  5. 解决微信小程序textarea 里输入的文字或者是placeholder里的值,飘到弹出view上

    在uniapp微信小程序开发中使用textarea,结果发现输入框的问题浮动起来,view无法把他覆盖,设法设置index的值也不生效,所以只能是通过条件v-if或者v-show使其隐藏就可以了

  6. 【Andriod-AlertDialog控件】 弹出对话框AlertDialog用法

    Result: Code: import android.app.Activity; import android.app.AlertDialog; import android.content.Di ...

  7. 列表控件 ListBox、ComboBox

    列表控件可以当作容器,内部可以有RadioButton.CheckBox.StackPanel等.即Items类型多样. ListBox,多个Item可被选中:ComboBox,只能有一个Item被选 ...

  8. IIS+上传4G文件

    最近在学习百度的开源上传组件WebUploader,写了一些示例以记录.WebUploader的缺点是没有一个比较好的现成的界面,这个界面需要自己去实现.自由度高了一些. WebUploader是由B ...

  9. 交换机配置——三层交换机实现VLAN间通信

    一.实验目的::用三层交换机让同一vlan的主机能通信,不同vlan的主机也能通信 二.拓扑图如下 三.具体步骤如下:. 先给每台主机和服务器配置ip地址和网关 例: (1)S1三层交换机配置: Sw ...

  10. LA 6434 The Busiest City dfs

    Tree Land Kingdom is a prosperous and lively kingdom. It has N cities which are connected to eachoth ...