[leetcode-609-Find Duplicate File in System]
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.
A group of duplicate files consists of at least two files that have exactly the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"
It means there are n files (f1.txt
, f2.txt
... fn.txt
with content f1_content
, f2_content
... fn_content
, respectively) in directory root/d1/d2/.../dm
. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.
The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:
"directory_path/file_name.txt"
Example 1:
Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Note:
- No order is required for the final output.
- You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
- The number of files given is in the range of [1,20000].
- You may assume no files or directories share the same name in the same directory.
- You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.
Follow-up beyond contest:
- Imagine you are given a real file system, how will you search files? DFS or BFS?
- If the file content is very large (GB level), how will you modify your solution?
- If you can only read the file by 1kb each time, how will you modify your solution?
- What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
- How to make sure the duplicated files you find are not false positive?
思路:
首先就是将字符串处理成完整路径的形式,然后用map统计相同内容的文件路径。
void parse(string orign,string& fileName,string& content)
{
int index = orign.find_first_of('(');
fileName = orign.substr(, index);
content = orign.substr(index + ,orign.length()-index-);
}
void getFullPath(string p,vector<string>&path,vector<string>&conVec)
{
stringstream ss(p);
string pathPrefix;
ss >> pathPrefix;
string file;
while (ss >> file)
{
string fileName, content;
parse(file,fileName, content);
path.push_back(pathPrefix + "/"+fileName);
conVec.push_back(content);
}
}
vector<vector<string>> findDuplicate(vector<string>& paths)
{
vector<string>pathVec, conVec;
for (auto p:paths)
{
getFullPath(p,pathVec,conVec);
}
map<string, set<string>>mp2;
for (int i = ; i < pathVec.size();i++)
{
mp2[conVec[i]].insert(pathVec[i]);
// cout << pathVec[i] << " " << conVec[i] << endl;
}
vector<vector<string>>ret;
for (auto it :mp2)
{
if (it.second.size() == )continue;
vector<string> temp(it.second.begin(),it.second.end());
ret.push_back(temp);
}
return ret;
}
看到相同思路的人写的,但是感觉大神的要简洁的多的多。。
vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> files;
vector<vector<string>> result; for (auto path : paths) {
stringstream ss(path);
string root;
string s;
getline(ss, root, ' ');
while (getline(ss, s, ' ')) {
string fileName = root + '/' + s.substr(, s.find('('));
string fileContent = s.substr(s.find('(') + , s.find(')') - s.find('(') - );
files[fileContent].push_back(fileName);
}
} for (auto file : files) {
if (file.second.size() > )
result.push_back(file.second);
} return result;
}
参考:
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-up
[leetcode-609-Find Duplicate File in System]的更多相关文章
- LC 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...
- 【leetcode】609. Find Duplicate File in System
题目如下: Given a list of directory info including directory path, and all the files with contents in th ...
- 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- [LeetCode] Find Duplicate File in System 在系统中寻找重复文件
Given a list of directory info including directory path, and all the files with contents in this dir ...
- LeetCode Find Duplicate File in System
原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...
- [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)
Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...
随机推荐
- JavaScript函数的方法
在一个对象中绑定函数,称为这个对象的方法. 在JavaScript中,对象的定义是: var xiaoming = { name:'小明'; birth:1990; }; 但是,如果我们给xiaomi ...
- js图片预览(一张图片预览)
核心思想:无论是一张图片上传还是多图上传,首先我们都需要先获得图片对象. 栗子: <inuput type="file" id="myfile" onch ...
- rm -f + 文件名+* 与 rm -f + 文件名* 的不同效果,大坑呀。
rm -f catalina.2018-10-22.* 与*号间无空格 rm -f catalina.2018-10-22. * :多了空格:
- Python基础—02-数据类型
数据类型 存储单位 最小单位是bit,表示二进制的0或1,一般写作b 最小的存储单位是字节,用byte表示,1B = 8b 1024B = 1KB 1024KB = 1MB 1024MB = 1GB ...
- Android手机上抓包神器
Packet Capture 一款依托安卓系统自身VPN来达到免Root抓取数据包的应用程序.Packet Capture一个使用SSL网络解密的 捕获数据包/网络嗅探 工具,虽然它的功能并不丰富,但 ...
- Java项目中的下载 与 上传
使用超级链接下载,一般会在浏览器中直接打开,而不是出现下载框 如果要确保出现下载框下载文件,则需要设置response中的参数: 1是要设置用附件的方式下载 Content-Disposition: ...
- TortoiseSVN SendRpt.exe not found解决方案
重启了Explorer.exe即可.这里也补充下简单的重启Explorer.exe的方法:打开任务管理器,找到“Windows资源管理器”,右键--重新启动. 或者,右键--结束任务,然后点击 文件- ...
- 解析 Nginx 负载均衡策略
转载:https://www.cnblogs.com/wpjamer/articles/6443332.html 1 前言 随着网站负载的不断增加,负载均衡(load balance)已不是陌生话题. ...
- 并查集(union-find sets)
一.并查集及其优化 - 并查集:由若干不相交集合组成,是一种简单但是很好用的数据结构,拥有优越的时空复杂性,一般用于处理一些不相交集合的查询和合并问题. - 三种操作: 1.Make_Set(x) 初 ...
- vs2017中的scanf_s
在visual studio 2017中格式化输入函数不同于其他c/c++编译器使用scanf,而是使用scanf_s. scanf_s相比较于scanf来说更安全,因为使用scanf_s函数需要有一 ...