https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

思路:

首先就是将字符串处理成完整路径的形式,然后用map统计相同内容的文件路径。

void parse(string orign,string& fileName,string& content)
{
int index = orign.find_first_of('(');
fileName = orign.substr(, index);
content = orign.substr(index + ,orign.length()-index-);
}
void getFullPath(string p,vector<string>&path,vector<string>&conVec)
{
stringstream ss(p);
string pathPrefix;
ss >> pathPrefix;
string file;
while (ss >> file)
{
string fileName, content;
parse(file,fileName, content);
path.push_back(pathPrefix + "/"+fileName);
conVec.push_back(content);
}
}
vector<vector<string>> findDuplicate(vector<string>& paths)
{
vector<string>pathVec, conVec;
for (auto p:paths)
{
getFullPath(p,pathVec,conVec);
}
map<string, set<string>>mp2;
for (int i = ; i < pathVec.size();i++)
{
mp2[conVec[i]].insert(pathVec[i]);
// cout << pathVec[i] << " " << conVec[i] << endl;
}
vector<vector<string>>ret;
for (auto it :mp2)
{
if (it.second.size() == )continue;
vector<string> temp(it.second.begin(),it.second.end());
ret.push_back(temp);
}
return ret;
}

看到相同思路的人写的,但是感觉大神的要简洁的多的多。。

vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> files;
vector<vector<string>> result; for (auto path : paths) {
stringstream ss(path);
string root;
string s;
getline(ss, root, ' ');
while (getline(ss, s, ' ')) {
string fileName = root + '/' + s.substr(, s.find('('));
string fileContent = s.substr(s.find('(') + , s.find(')') - s.find('(') - );
files[fileContent].push_back(fileName);
}
} for (auto file : files) {
if (file.second.size() > )
result.push_back(file.second);
} return result;
}

参考:

https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-up

[leetcode-609-Find Duplicate File in System]的更多相关文章

  1. LC 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  3. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  4. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. LeetCode Find Duplicate File in System

    原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...

  7. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. vue项目模拟后台数据

    这次我们来模拟一些后台数据,然后去请求它并且将其渲染到界面上.关于项目的搭建鄙人斗胆向大家推荐我的一篇随笔<Vue开发环境搭建及热更新> 一.数据建立 我这里为了演示这个过程所以自己编写了 ...

  2. 几个常用的 Git 高级命令

    Git 是一款开源优秀的版本管理工具,它最初由 Linus Torvalds 等人开发,用于管理 Linux Kernel 的版本研发.相关的书籍和教程网上琳琅满目,它们多数都详细的介绍其基本的使用和 ...

  3. Block代替delegate,尽量使用block,对于有大量的delegate方法才考虑使用protocol实现.

    Block代替delegate,尽量使用block,对于有大量的delegate方法才考虑使用protocol实现. 1.Block语法总结及示例如下:         //1.普通代码块方式bloc ...

  4. Java入门(一)

    一.语言分类 机器语言 汇编语言 高级语言 二.Java分类 JavaSE 标准版,主要针对桌面应用 JavaEE 企业版,主要针对服务器端的应用 JavaME 微型版,主要针对消费性电子产品的应用 ...

  5. Django从请求到返回流程

    图1:流程图 1. 用户通过浏览器请求一个页面2.请求到达Request Middlewares,中间件对request做一些预处理或者直接response请求3.URLConf通过urls.py文件 ...

  6. thinkphp5 分页带参数的解决办法

    文档有说可以在paginate带参数,然后研究了下,大概就是这样的: $list=Db::name('member') ->where('member_name|member_mobile|se ...

  7. CSS基础全荟

    一.CSS概述 1.css是什么?? 层叠样式表 2.css的引入方式 1.行内样式   在标签上加属性style="属性名1:属性值1;属性名2:属性值2;..." 2.内嵌式  ...

  8. js函数的节流和防抖

    js函数的节流和防抖 用户浏览页面时会不可避免的触发一些高频度触发事件(例如页面 scroll ,屏幕 resize,监听用户输入等),这些事件会频繁触发浏览器的重拍(reflow)和重绘(repai ...

  9. C# 多条件拼接sql

    #region 多条件搜索时,使用List集合来拼接条件(拼接Sql) StringBuilder sql = new StringBuilder("select * from PhoneN ...

  10. php接口编程

    1:自定义接口编程 对于自定义接口最关键就是写接口文档,在接口文档中规定具体的请求地址以及方式,还有具体的参数信息 2:接口文档编写 请求地址 http://jxshop.com/Api/login ...