https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

思路:

首先就是将字符串处理成完整路径的形式,然后用map统计相同内容的文件路径。

void parse(string orign,string& fileName,string& content)
{
int index = orign.find_first_of('(');
fileName = orign.substr(, index);
content = orign.substr(index + ,orign.length()-index-);
}
void getFullPath(string p,vector<string>&path,vector<string>&conVec)
{
stringstream ss(p);
string pathPrefix;
ss >> pathPrefix;
string file;
while (ss >> file)
{
string fileName, content;
parse(file,fileName, content);
path.push_back(pathPrefix + "/"+fileName);
conVec.push_back(content);
}
}
vector<vector<string>> findDuplicate(vector<string>& paths)
{
vector<string>pathVec, conVec;
for (auto p:paths)
{
getFullPath(p,pathVec,conVec);
}
map<string, set<string>>mp2;
for (int i = ; i < pathVec.size();i++)
{
mp2[conVec[i]].insert(pathVec[i]);
// cout << pathVec[i] << " " << conVec[i] << endl;
}
vector<vector<string>>ret;
for (auto it :mp2)
{
if (it.second.size() == )continue;
vector<string> temp(it.second.begin(),it.second.end());
ret.push_back(temp);
}
return ret;
}

看到相同思路的人写的,但是感觉大神的要简洁的多的多。。

vector<vector<string>> findDuplicate(vector<string>& paths) {
unordered_map<string, vector<string>> files;
vector<vector<string>> result; for (auto path : paths) {
stringstream ss(path);
string root;
string s;
getline(ss, root, ' ');
while (getline(ss, s, ' ')) {
string fileName = root + '/' + s.substr(, s.find('('));
string fileContent = s.substr(s.find('(') + , s.find(')') - s.find('(') - );
files[fileContent].push_back(fileName);
}
} for (auto file : files) {
if (file.second.size() > )
result.push_back(file.second);
} return result;
}

参考:

https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-up

[leetcode-609-Find Duplicate File in System]的更多相关文章

  1. LC 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  3. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  4. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. LeetCode Find Duplicate File in System

    原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...

  7. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. java的异常分类

    结构关系 throwable error   exception checked异常 runtime异常 checked异常也叫io异常这种异常一般我们会在程序块加入trycatch处理它. runt ...

  2. 复制功能 js

    示例: <input class="herf" type="text" v-model="herfUrl" readonly=&quo ...

  3. CentOS7 64位下 MySQL5.7的安装与配置(YUM)

    趁着11.11的时候在阿里云上弄了一云服务ECS(作为自己的节日礼物 > _ <) ,系统为CentOS的,打算弄一个人博客之类的,这些天正在备案当中(不知得多久). 忙里偷闲,在中午休息 ...

  4. jsp+spring+jquery+ajax的简单例子

    初学b/s编程,花费了许多时间,进度颇慢! 不过终于完成了一个简单的例子: jsp代码 <%@ page language="java" contentType=" ...

  5. Plugin was not installed: Cannot download 'https://plugins.jetbrains.com/pluginManager''

    在Android studio中安装插件的时候,提示了类似这种的错误,解决这个问题有以下几步 1.打开Configure->Settings 2.System Settings->Upda ...

  6. vue 数组数据更新或者对象数据更新 但是页面没有同步问题

    1,使用set函数来设置数据. 2,你可以通过 $forceUpdate 来做这件事.在数据赋值之后 就直接调用 this.$forceUpdata()

  7. python基础小知识,is和==的区别,编码和解码

    1.is和==的区别 1)id() 通过id()我们可以查看到一个变量表示的值在内存中的地址 >>> s1 = "Tanxu" >>> s2 = ...

  8. python 装饰器 生成及原里

    # 装饰器形成的过程 : 最简单的装饰器 有返回值的 有一个参数 万能参数 # 装饰器的作用 # 原则 :开放封闭原则 # 语法糖 :@ # 装饰器的固定模式 #不懂技术 import time # ...

  9. 查询各科成绩最高和最低的分:以如下形式显示:课程ID,最高分,最低分

    SELECT L.C# As 课程ID,L.score AS 最高分,R.score AS 最低分 FROM SC L ,SC AS R WHERE L.C# = R.C# and L.score = ...

  10. 博科brocade光纤交换机alias-zone的划分-->实操案例

    一,图形化操作 光纤交换机作为SAN网络的重要组成部分,在日常应用中非常普遍,本次将以常用的博科交换机介绍基本的配置方法. 博科300实物图: 环境描述: 如上图,四台服务器通过各自的双HBA卡连接至 ...