Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

Runtime: 84 ms, faster than 49.48% of C++ online submissions for Find Duplicate File in System.

简单字符串判断。

class Solution {
public:
unordered_map<string, vector<string>> mp;
void process(const string& s){
vector<string> filecontent;
vector<string> emptycontent;
int idx = ;
for(int i=; i<s.size(); i++){
if(s[i] == ' '){
//cout << i << endl;
filecontent.push_back(s.substr(idx, i - idx));
idx = i + ;
}
}
filecontent.push_back(s.substr(idx));
//for(auto v : filecontent) cout << v << endl; for(int i=; i<filecontent.size(); i++){
for(int j=; j<filecontent[i].size(); j++){
if(filecontent[i][j] == '('){
if(filecontent[i][j+] == ')'){
emptycontent.push_back(filecontent[] +"/"+ filecontent[i].substr(,j));
}else {
auto tmp = filecontent[i].substr(j+,filecontent[i].size() - j - );
//cout << tmp << endl;
mp[filecontent[i].substr(j+,filecontent[i].size() - j - )].push_back(filecontent[] + "/"+filecontent[i].substr(,j));
}
}
}
}
}
vector<vector<string>> findDuplicate(vector<string>& paths) {
vector<vector<string>> ret;
for(int i=; i<paths.size(); i++){
process(paths[i]);
}
for(auto it = mp.begin(); it != mp.end(); it++){
if(it->second.size() >= ){
ret.push_back(it->second);
}
}
return ret;
}
};

LC 609. Find Duplicate File in System的更多相关文章

  1. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  3. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  4. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. LeetCode Find Duplicate File in System

    原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...

  7. [leetcode-609-Find Duplicate File in System]

    https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. java_day10_多线程

    第十章:线程 1.进程和线程的概述 1)进程和线程定义 进程是具有一定独立功能的程序关于某个数据集合上的一次运行活动,进程是系统进行资源分配和调度的一个独立单位. 线程是进程的一个实体,是CPU调度和 ...

  2. 通过SSH解压缩.tar.gz、.gz、.zip文件的方法

    一般在linux下,常用的压缩格式有如下几个: .tar.gz..gz..zip 解压 .tar.gz 文件命令: tar -zxvf xxx.tar.gz 解压 .gz 文件命令: gunzip x ...

  3. 【墨西哥区域赛】Carpet

    原题: 题意: 给你一个树,有1e5个节点,让你把这个树放在一个长1e6宽20的网格图里,要求一个格子放一个节点,树边之间不能相交 这是一道构造题 因为树的形状可能性很多,很复杂,所以不能简单猜测,而 ...

  4. C#形参和实参、引用类型和值类型使用时的一个注意点。

    这是早上群里讨论的例子. static void main(string [] arg){ var p1=new Person{Name="张三"}; var p2=new Per ...

  5. JS 给数字加三位一逗号间隔的方法

    1.方法 function format_number(n) { var b = parseInt(n).toString(); var len = b.length; ) { return b; } ...

  6. ios11返回按钮问题

    在苹果系统升级到iOS11之后,页面的返回按钮的点击区域是根据设置的按钮的frame来确定的,在设置按钮太小的时候,点击就会出现点击多次才能点击到一次的现象,处理的方法就是设置按钮的frame变大代码 ...

  7. 14-SQLServer索引碎片

    一.总结 1.数据库的存储本身是无序的,建立聚集索引之后,就会按照聚集索引的物理顺序存入硬盘: 2.建立索引完全是为了提升读取的速度,相对写入的速度就会降低,没有索引的表写入时最快的,但是大多数系统读 ...

  8. HDU 6044 - Limited Permutation | 2017 Multi-University Training Contest 1

    研究一下建树 : /* HDU 6044 - Limited Permutation [ 读入优化,笛卡尔树 ] | 2017 Multi-University Training Contest 1 ...

  9. hive2.3.4安装

    一.安装Hadoop Hive运行在Hadoop环境之上,因此需要hadoop环境,本次在安装在hadoop完全分布式模式的namennode节点上 请参考:hadoop搭建 二.安装Hive 下载 ...

  10. keras中常用的初始化器

    keras中常用的初始化器有恒值初始化器.正态分布初始化器.均匀分布初始化器 恒值初始化器: keras.initializers.Zeros() keras.initializers.Ones() ...