https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

思路:

首先就是将字符串处理成完整路径的形式,然后用map统计相同内容的文件路径。

void parse(string orign,string& fileName,string& content)
{
int index = orign.find_first_of('(');
fileName = orign.substr(, index);
content = orign.substr(index + ,orign.length()-index-);
}
void getFullPath(string p,vector<string>&path,vector<string>&conVec)
{
stringstream ss(p);
string pathPrefix;
ss >> pathPrefix;
string file;
while (ss >> file)
{
string fileName, content;
parse(file,fileName, content);
path.push_back(pathPrefix + "/"+fileName);
conVec.push_back(content);
}
}
vector<vector<string>> findDuplicate(vector<string>& paths)
{
vector<string>pathVec, conVec;
for (auto p:paths)
{
getFullPath(p,pathVec,conVec);
}
map<string, set<string>>mp2;
for (int i = ; i < pathVec.size();i++)
{
mp2[conVec[i]].insert(pathVec[i]);
// cout << pathVec[i] << " " << conVec[i] << endl;
}
vector<vector<string>>ret;
for (auto it :mp2)
{
if (it.second.size() == )continue;
vector<string> temp(it.second.begin(),it.second.end());
ret.push_back(temp);
}
return ret;
}

看到相同思路的人写的,但是感觉大神的要简洁的多的多。。

vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> files;
vector<vector<string>> result; for (auto path : paths) {
stringstream ss(path);
string root;
string s;
getline(ss, root, ' ');
while (getline(ss, s, ' ')) {
string fileName = root + '/' + s.substr(, s.find('('));
string fileContent = s.substr(s.find('(') + , s.find(')') - s.find('(') - );
files[fileContent].push_back(fileName);
}
} for (auto file : files) {
if (file.second.size() > )
result.push_back(file.second);
} return result;
}

参考:

https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-up

[leetcode-609-Find Duplicate File in System]的更多相关文章

  1. LC 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  3. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  4. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. LeetCode Find Duplicate File in System

    原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...

  7. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. 到底什么时候需要使用 final

    final: final修饰属性,则该属性不可再次改变,而且在初始化中必须在属性或者是构造方法中其中且中有一个中初始化他 final修饰方法,则该方法不可被重写 final修饰类,则不可被继承 1:当 ...

  2. android ListView 与GridView 学习总结(五)

    ListView的使用总结 基本使用:   步骤:在布局文件中定义一个ListView控件-在活动中获得ListView的实例-获得适配器adapter的实例并且传入三个参数-把适配器对象传递给lis ...

  3. 编写可维护的JavaScript---事件处理

    在JavaScript应用中事件处理是非常重要的,所有的JavaScript都是通过事件绑定到UI上的. 1. 典型用法 当事件触发的时候,事件对象event会最为回调参数传入到事件处理程序中.eve ...

  4. #leetcode刷题之路2-两数相加

    给出两个 非空 的链表用来表示两个非负的整数.其中,它们各自的位数是按照 逆序 的方式存储的,并且它们的每个节点只能存储 一位 数字. 如果,我们将这两个数相加起来,则会返回一个新的链表来表示它们的和 ...

  5. windows 开启 nginx 监听80 端口 以及 禁用 http 服务后,无法重启 HTTP 服务,提示 系统错误 123,文件目录、卷标出错

    1. 正常情况直接运行  start nginx.exe 不能开启成功,因为 80 端口被占用.提示: bind() to 0.0.0.0:80 failed (10013: An attempt w ...

  6. Ubuntu 18.04添加新网卡

    在Ubuntu 18.04 LTS上配置IP地址的方法与旧方法有很大不同.与以前的版本不同,Ubuntu 18.04使用Netplan(一种新的命令行网络配置实用程序)来配置IP地址. 在这种新方法中 ...

  7. Ehcache基于java API实现

    上代码: package com.utils.cacheutils; import com.situopenapi.constant.EhcacheConstants; import com.situ ...

  8. 发布django项目

    supervisor需要用到的技术 1. nginx反向代理 2. nginx负载均衡 3. uwsgi 4. supervisor 5. virtualenv 安装nginx 详情参考 https: ...

  9. Discuz论坛搜索下拉框插件openSug

    Discuz!只需安装openSug插件即可获得带有“搜索框提示”功能的搜索框,让您的Discuz搜索更便捷! 下载:https://www.opensug.org/faq/.../opensug.d ...

  10. Element-ui学习使用

    这是我使用Element-ui的布局,排布的一个界面,原本我是使用WinfowsForm来做的一个摄像头注册以及查询的小工具,目前我关注前后端的开发,所以就想着能不能把这么个小工具,我用前后端的形式开 ...