LC 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.
A group of duplicate files consists of at least two files that have exactly the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"
It means there are n files (f1.txt, f2.txt ... fn.txt with content f1_content, f2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.
The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:
"directory_path/file_name.txt"
Example 1:
Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Note:
- No order is required for the final output.
- You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
- The number of files given is in the range of [1,20000].
- You may assume no files or directories share the same name in the same directory.
- You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.
Follow-up beyond contest:
- Imagine you are given a real file system, how will you search files? DFS or BFS?
- If the file content is very large (GB level), how will you modify your solution?
- If you can only read the file by 1kb each time, how will you modify your solution?
- What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
- How to make sure the duplicated files you find are not false positive?
Runtime: 84 ms, faster than 49.48% of C++ online submissions for Find Duplicate File in System.
简单字符串判断。
class Solution {
public:
unordered_map<string, vector<string>> mp;
void process(const string& s){
vector<string> filecontent;
vector<string> emptycontent;
int idx = ;
for(int i=; i<s.size(); i++){
if(s[i] == ' '){
//cout << i << endl;
filecontent.push_back(s.substr(idx, i - idx));
idx = i + ;
}
}
filecontent.push_back(s.substr(idx));
//for(auto v : filecontent) cout << v << endl;
for(int i=; i<filecontent.size(); i++){
for(int j=; j<filecontent[i].size(); j++){
if(filecontent[i][j] == '('){
if(filecontent[i][j+] == ')'){
emptycontent.push_back(filecontent[] +"/"+ filecontent[i].substr(,j));
}else {
auto tmp = filecontent[i].substr(j+,filecontent[i].size() - j - );
//cout << tmp << endl;
mp[filecontent[i].substr(j+,filecontent[i].size() - j - )].push_back(filecontent[] + "/"+filecontent[i].substr(,j));
}
}
}
}
}
vector<vector<string>> findDuplicate(vector<string>& paths) {
vector<vector<string>> ret;
for(int i=; i<paths.size(); i++){
process(paths[i]);
}
for(auto it = mp.begin(); it != mp.end(); it++){
if(it->second.size() >= ){
ret.push_back(it->second);
}
}
return ret;
}
};
LC 609. Find Duplicate File in System的更多相关文章
- 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- 【leetcode】609. Find Duplicate File in System
题目如下: Given a list of directory info including directory path, and all the files with contents in th ...
- 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...
- [LeetCode] Find Duplicate File in System 在系统中寻找重复文件
Given a list of directory info including directory path, and all the files with contents in this dir ...
- [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- LeetCode Find Duplicate File in System
原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...
- [leetcode-609-Find Duplicate File in System]
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...
- Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)
Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...
随机推荐
- fragment事务 的基本处理
处理fragment事务 动态加载fragmentMyFragment2 fragment2=new MyFragment2();//new出一个fragment对象FragmentManager f ...
- Cobbler自动化装机脚本
#!/bin/bash ens33_ip=192.168.1.3 ens33_gateway=192.168.1.1 ens37_ip=192.168.207.2 dhcp_wd=192.168.20 ...
- Acwing 197. 阶乘分解
给定整数 N ,试把阶乘 N! 分解质因数,按照算术基本定理的形式输出分解结果中的 pipi 和 cici 即可. 输入格式 一个整数N. 输出格式 N! 分解质因数后的结果,共若干行,每行一对pi, ...
- jsfuck-原理
jsfuck真的fuck,第一眼就是WTF?? Example The following source will do an alert(1): [][(![]+[])[+[]]+([![]]+[] ...
- Python:用filter函数求素数 (再理解generator)
目的:更熟悉应用generator. 参考:https://www.liaoxuefeng.com/wiki/1016959663602400/1017404530360000 素数定义: 素数:质数 ...
- python小练手题1
1. """ Write a program which can compute the factorial of a given numbers. The result ...
- 精选30道Java多线程面试题
1.线程和进程的区别 进程是应用程序的执行实例.比如说,当你双击的Microsoft Word的图标,你就开始运行的Word的进程.线程是执行进程中的路径.另外,一个过程可以包含多个线程.启动Word ...
- hexo主题hexo-theme-yilia文章太长,截断按钮文字的实现
文章太长,截断按钮文字不是通过配置文件_config.yml实现的,而是在文章内容里实现,在你想截断的文章位置加上 <!-- more --> 就可以实现了! 参考博客:hexo-them ...
- 开启防火墙如何部署k8s
你可以不关闭防火墙,只需要开启这些端口就行了MASTER节点6443* Kubernetes API server 2379-2380 etcd server client API 10250 Kub ...
- BZOJ 3881[COCI2015]Divljak (AC自动机+dfs序+lca+BIT)
显然是用AC自动机 先构建好AC自动机,当B中插入新的串时就在trie上跑,对于当前点,首先这个点所代表的串一定出现过,然后这个点指向的fail也一定出现过.那么我们把每个点fail当作父亲,建一棵f ...