LC 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.
A group of duplicate files consists of at least two files that have exactly the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"
It means there are n files (f1.txt, f2.txt ... fn.txt with content f1_content, f2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.
The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:
"directory_path/file_name.txt"
Example 1:
Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Note:
- No order is required for the final output.
- You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
- The number of files given is in the range of [1,20000].
- You may assume no files or directories share the same name in the same directory.
- You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.
Follow-up beyond contest:
- Imagine you are given a real file system, how will you search files? DFS or BFS?
- If the file content is very large (GB level), how will you modify your solution?
- If you can only read the file by 1kb each time, how will you modify your solution?
- What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
- How to make sure the duplicated files you find are not false positive?
Runtime: 84 ms, faster than 49.48% of C++ online submissions for Find Duplicate File in System.
简单字符串判断。
class Solution {
public:
unordered_map<string, vector<string>> mp;
void process(const string& s){
vector<string> filecontent;
vector<string> emptycontent;
int idx = ;
for(int i=; i<s.size(); i++){
if(s[i] == ' '){
//cout << i << endl;
filecontent.push_back(s.substr(idx, i - idx));
idx = i + ;
}
}
filecontent.push_back(s.substr(idx));
//for(auto v : filecontent) cout << v << endl;
for(int i=; i<filecontent.size(); i++){
for(int j=; j<filecontent[i].size(); j++){
if(filecontent[i][j] == '('){
if(filecontent[i][j+] == ')'){
emptycontent.push_back(filecontent[] +"/"+ filecontent[i].substr(,j));
}else {
auto tmp = filecontent[i].substr(j+,filecontent[i].size() - j - );
//cout << tmp << endl;
mp[filecontent[i].substr(j+,filecontent[i].size() - j - )].push_back(filecontent[] + "/"+filecontent[i].substr(,j));
}
}
}
}
}
vector<vector<string>> findDuplicate(vector<string>& paths) {
vector<vector<string>> ret;
for(int i=; i<paths.size(); i++){
process(paths[i]);
}
for(auto it = mp.begin(); it != mp.end(); it++){
if(it->second.size() >= ){
ret.push_back(it->second);
}
}
return ret;
}
};
LC 609. Find Duplicate File in System的更多相关文章
- 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- 【leetcode】609. Find Duplicate File in System
题目如下: Given a list of directory info including directory path, and all the files with contents in th ...
- 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...
- [LeetCode] Find Duplicate File in System 在系统中寻找重复文件
Given a list of directory info including directory path, and all the files with contents in this dir ...
- [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- LeetCode Find Duplicate File in System
原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...
- [leetcode-609-Find Duplicate File in System]
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...
- Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)
Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...
随机推荐
- JavaWeb【一、简介】
原计划上周完成的内容,硬是过了一个清明拖到了这周,工作上还有很多东西没做...明天抓紧看把,争取这周末搞定 内容简介:(学习完后会重新梳理调整) 1.JavaWeb[一.简介] 2.JavaWeb[二 ...
- dedecms 上传目录路径
DedeCms已经升级到5.7版本了..可惜附件的目录还是不统一,比如我们从后台把附件目录调整为Ym(默认为Ymd)然而我们的附件路径依然是不一样的比如:Ymd代表年月日从文章里上传路径为:/Ym/1 ...
- 使用wget下载百度云资源
目录 使用wget下载百度云资源 一.材料准备: 二.步骤 三.总结 使用wget下载百度云资源 一.材料准备: [BaiduPan explorer]谷歌插件,可以加载文件的真实下载地址 [Chro ...
- python面向编程: 常用模块补充与面向对象
一.常用模块 1.模块 的用用法 模块的相互导入 绝对导入 从sys.path (项目根目录)开始的完整路径 相对导入 是指相对于当前正在执行的文件开始的路径 只能用于包内模块相互间导入 不能超过顶层 ...
- Asp.Net MVC5 使用Unity 实现依赖注入
到这里安装完毕会提示一个redme.txt,说 把 UnityConfig.RegisterComponents(); 放到下图的位置,我们照做即可. 然后我们看一下这个 UnityConfig ...
- 获取header信息
获取header信息 function _get_all_header() { // 忽略获取的header数据.这个函数后面会用到.主要是起过滤作用 $ignore = array('host',' ...
- endpoint
你把机器关机一次,估计被你只写满不读,限速死锁了,因为目前没有心跳控制
- Redis 与 MQ 的区别
Redis是一个高性能的key-value数据库,它的出现很大程度补偿了memcached这类key-value存储的不足.虽然它是一个数据库系统,但本身支持MQ功能,完全可以当做一个轻量级的队列服务 ...
- 清北学堂dp图论营游记day5
ysq主讲: tarjan缩点+拓扑+dij最短路. floyd..... 单源..最长路... 建正反两个图. 二分答案,把大于答案的边加入到新图中,如果能走过去到终点,就可以. 或者:从大到小加边 ...
- idea 启动ssm项目
https://www.cnblogs.com/yeya/p/10320885.html https://www.cnblogs.com/chenlinghong/p/8339555.html