[leetcode-609-Find Duplicate File in System]
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.
A group of duplicate files consists of at least two files that have exactly the same content.
A single directory info string in the input list has the following format:
"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"
It means there are n files (f1.txt, f2.txt ... fn.txt with content f1_content, f2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.
The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:
"directory_path/file_name.txt"
Example 1:
Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]
Note:
- No order is required for the final output.
- You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
- The number of files given is in the range of [1,20000].
- You may assume no files or directories share the same name in the same directory.
- You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.
Follow-up beyond contest:
- Imagine you are given a real file system, how will you search files? DFS or BFS?
- If the file content is very large (GB level), how will you modify your solution?
- If you can only read the file by 1kb each time, how will you modify your solution?
- What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
- How to make sure the duplicated files you find are not false positive?
思路:
首先就是将字符串处理成完整路径的形式,然后用map统计相同内容的文件路径。
void parse(string orign,string& fileName,string& content)
{
int index = orign.find_first_of('(');
fileName = orign.substr(, index);
content = orign.substr(index + ,orign.length()-index-);
}
void getFullPath(string p,vector<string>&path,vector<string>&conVec)
{
stringstream ss(p);
string pathPrefix;
ss >> pathPrefix;
string file;
while (ss >> file)
{
string fileName, content;
parse(file,fileName, content);
path.push_back(pathPrefix + "/"+fileName);
conVec.push_back(content);
}
}
vector<vector<string>> findDuplicate(vector<string>& paths)
{
vector<string>pathVec, conVec;
for (auto p:paths)
{
getFullPath(p,pathVec,conVec);
}
map<string, set<string>>mp2;
for (int i = ; i < pathVec.size();i++)
{
mp2[conVec[i]].insert(pathVec[i]);
// cout << pathVec[i] << " " << conVec[i] << endl;
}
vector<vector<string>>ret;
for (auto it :mp2)
{
if (it.second.size() == )continue;
vector<string> temp(it.second.begin(),it.second.end());
ret.push_back(temp);
}
return ret;
}
看到相同思路的人写的,但是感觉大神的要简洁的多的多。。
vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> files;
vector<vector<string>> result;
for (auto path : paths) {
stringstream ss(path);
string root;
string s;
getline(ss, root, ' ');
while (getline(ss, s, ' ')) {
string fileName = root + '/' + s.substr(, s.find('('));
string fileContent = s.substr(s.find('(') + , s.find(')') - s.find('(') - );
files[fileContent].push_back(fileName);
}
}
for (auto file : files) {
if (file.second.size() > )
result.push_back(file.second);
}
return result;
}
参考:
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-up
[leetcode-609-Find Duplicate File in System]的更多相关文章
- LC 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...
- 【leetcode】609. Find Duplicate File in System
题目如下: Given a list of directory info including directory path, and all the files with contents in th ...
- 609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- [LeetCode] Find Duplicate File in System 在系统中寻找重复文件
Given a list of directory info including directory path, and all the files with contents in this dir ...
- LeetCode Find Duplicate File in System
原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...
- [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
- HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)
Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...
随机推荐
- 菜鸟笔记 -- Chapter 6.2.4 成员方法
6.2.4 成员方法 在Java中使用成员方法对应于类对象的行为,在有些地方也会将方法称之为函数,成员方法是定义在类中具有特定功能的一段独立小程序.方法格式如下: 修饰符 返回值类型 成员方法名 ( ...
- vue+nodejs+express+mysql 建立一个在线网盘程序
vue+nodejs+express+mysql 建立一个在线网盘程序 目录 vue+nodejs+express+mysql 建立一个在线网盘程序 第一章 开发环境准备 1.1 开发所用工具简介 1 ...
- CF1042C Array Product(贪心,模拟)
题目描述 You are given an array aa consisting of nn integers. You can perform the following operations w ...
- MySQL Waiting for table metadata lock的解决方法
最近需要在某一个表中新增字段,使用Sequel Pro 或者Navicat工具都会出现程序没有反应,使用 show processlist 查看,满屏都是 Waiting for table meta ...
- HTML自定义Checkbox框背景色
input[type=checkbox]{ margin-right:5px; width:13px; height:13px; }input[type=checkbox]:after { width ...
- Asp.Net Core 使用Docker进行容器化部署(二)使用Nginx进行反向代理
上一篇介绍了Asp.Net 程序在Docker中的部署,这篇介绍使用Nginx对Docker的实例进行反向代理 一.修改Nginx配置文件 使用winscp链接Liunx服务器,在/ect/nginx ...
- 吐血分享:QQ群霸屏技术教程(利润篇)
QQ群技术,不论日进几百,空隙时间多的可以尝试,日进100问题不大. QQ群技术,如何赚钱,能赚多少钱?不同行业,不同关键词,不同力度,不一样的产出. 群费 群费,这个和付费群是有区别的,群费在手机端 ...
- centOS下更新yum源
CentOS下更新yum源 1.使用如下命令,备份/etc/yum.repos.d/CentOS-Base.repo. cp /etc/yum.repos.d/CentOS-Base.repo /et ...
- 初识python 文件读取 保存
上一章最后一题的答案:infors.sort(key=lambda x:x['age'])print(infors)--->[{'name': 'laowang', 'age': 23}, {' ...
- linux io 学习笔记(02)---条件变量,管道,信号
条件变量的工作原理:对当前不访问共享资源的任务,直接执行睡眠处理,如果此时需要某个任务访问资源,直接将该任务唤醒.条件变量类似异步通信,操作的核心:睡眠.唤醒. 1.pthread_cond_t 定 ...