[leetcode-609-Find Duplicate File in System]
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.
A group of duplicate files consists of at least two files that have exactly the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"
It means there are n files (f1.txt, f2.txt ... fn.txt with content f1_content, f2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.
The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:
"directory_path/file_name.txt"
Example 1:
Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Note:
- No order is required for the final output.
- You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
- The number of files given is in the range of [1,20000].
- You may assume no files or directories share the same name in the same directory.
- You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.
Follow-up beyond contest:
- Imagine you are given a real file system, how will you search files? DFS or BFS?
- If the file content is very large (GB level), how will you modify your solution?
- If you can only read the file by 1kb each time, how will you modify your solution?
- What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
- How to make sure the duplicated files you find are not false positive?
思路:
首先就是将字符串处理成完整路径的形式,然后用map统计相同内容的文件路径。
void parse(string orign,string& fileName,string& content)
{
int index = orign.find_first_of('(');
fileName = orign.substr(, index);
content = orign.substr(index + ,orign.length()-index-);
}
void getFullPath(string p,vector<string>&path,vector<string>&conVec)
{
stringstream ss(p);
string pathPrefix;
ss >> pathPrefix;
string file;
while (ss >> file)
{
string fileName, content;
parse(file,fileName, content);
path.push_back(pathPrefix + "/"+fileName);
conVec.push_back(content);
}
}
vector<vector<string>> findDuplicate(vector<string>& paths)
{
vector<string>pathVec, conVec;
for (auto p:paths)
{
getFullPath(p,pathVec,conVec);
}
map<string, set<string>>mp2;
for (int i = ; i < pathVec.size();i++)
{
mp2[conVec[i]].insert(pathVec[i]);
// cout << pathVec[i] << " " << conVec[i] << endl;
}
vector<vector<string>>ret;
for (auto it :mp2)
{
if (it.second.size() == )continue;
vector<string> temp(it.second.begin(),it.second.end());
ret.push_back(temp);
}
return ret;
}
看到相同思路的人写的,但是感觉大神的要简洁的多的多。。
vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> files;
vector<vector<string>> result;
for (auto path : paths) {
stringstream ss(path);
string root;
string s;
getline(ss, root, ' ');
while (getline(ss, s, ' ')) {
string fileName = root + '/' + s.substr(, s.find('('));
string fileContent = s.substr(s.find('(') + , s.find(')') - s.find('(') - );
files[fileContent].push_back(fileName);
}
}
for (auto file : files) {
if (file.second.size() > )
result.push_back(file.second);
}
return result;
}
参考:
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-up
[leetcode-609-Find Duplicate File in System]的更多相关文章
- LC 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...
- 【leetcode】609. Find Duplicate File in System
题目如下: Given a list of directory info including directory path, and all the files with contents in th ...
- 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- [LeetCode] Find Duplicate File in System 在系统中寻找重复文件
Given a list of directory info including directory path, and all the files with contents in this dir ...
- LeetCode Find Duplicate File in System
原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...
- [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)
Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...
随机推荐
- JavaScript:直接写入 HTML 输出流
<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content=&q ...
- Linux 下的多线程编程(1)
#include<stdio.h> #include<pthread.h> #include<string.h> #include<sys/time.h> ...
- JWT如何在Spring Cloud微服务系统中在服务相互调时传递
转载请标明出处: http://blog.csdn.net/forezp/article/details/78676036 本文出自方志朋的博客 在微服务系统中,为了保证微服务系统的安全,常常使用jw ...
- Sass 基础(七)
Sass Maps 的函数-map-remove($map,$key),keywords($args) map-remove($map,$key) map-remove($map,$key)函数是用来 ...
- 通过xshell在linux上安装solr4.10.3
通过xshell在linux上安装solr4.10.3 0)下载linux下的安装包 1)通过xftp6上传到linux上 3)在xshell下依次执行 解压命令:tar xvfz solr.tgz( ...
- Linux密钥登录原理和ssh使用密钥实现免密码登陆
目录 1. 公钥私钥简介 2. 使用密钥进行ssh免密登录 2.1. 实验环境 2.2. 开始实验 3. ssh的两种登陆方式介绍 3.1. 口令验证登录 3.2. 密钥验证登录 4. 总结 1.公私 ...
- python核心编程2 第十二章 练习
12–5. 使用 __import__().(a) 使用 __import__ 把一个模块导入到你的名称空间. 你最后使用了什么样的语法? (b) 和上边相同, 使用 __import__() 从指定 ...
- Keepalived搭建主从架构、主主架构实例
实例拓扑图: DR1和DR2部署Keepalived和lvs作主从架构或主主架构,RS1和RS2部署nginx搭建web站点. 注意:各节点的时间需要同步(ntpdate ntp1.aliyun.co ...
- 二、html篇
1.<br/> 有时css实现换行比较麻烦,可以使用该标签进行换行. 2.<strong></strong> <ins></ins> & ...
- 关于<meta name="viewport" content="width= device-width,user-scalable= 0,initial-scale= 1.0,minimum-scale= 1.0">
<meta name="viewport" content=" width= device-width, user-scalable= 0, initial-sca ...