LC 609. Find Duplicate File in System

Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txt, f2.txt ... fn.txt with content f1_content, f2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:

["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]

Output:

[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

No order is required for the final output.
You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
The number of files given is in the range of [1,20000].
You may assume no files or directories share the same name in the same directory.
You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

Imagine you are given a real file system, how will you search files? DFS or BFS?
If the file content is very large (GB level), how will you modify your solution?
If you can only read the file by 1kb each time, how will you modify your solution?
What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
How to make sure the duplicated files you find are not false positive?

Runtime: 84 ms, faster than 49.48% of C++ online submissions for Find Duplicate File in System.

简单字符串判断。

class Solution {

public:

  unordered_map<string, vector<string>> mp;

  void process(const string& s){

    vector<string> filecontent;

    vector<string> emptycontent;

    int idx = ;

    for(int i=; i<s.size(); i++){

      if(s[i] == ' '){

        //cout << i << endl;

        filecontent.push_back(s.substr(idx, i - idx));

        idx = i + ;

      }

    }

    filecontent.push_back(s.substr(idx));

    //for(auto v : filecontent) cout << v << endl;

    for(int i=; i<filecontent.size(); i++){

      for(int j=; j<filecontent[i].size(); j++){

        if(filecontent[i][j] == '('){

          if(filecontent[i][j+] == ')'){

            emptycontent.push_back(filecontent[] +"/"+ filecontent[i].substr(,j));

          }else {

            auto tmp = filecontent[i].substr(j+,filecontent[i].size() - j - );

            //cout << tmp << endl;

            mp[filecontent[i].substr(j+,filecontent[i].size() - j - )].push_back(filecontent[] + "/"+filecontent[i].substr(,j));

          }

        }

      }

    }

  }

  vector<vector<string>> findDuplicate(vector<string>& paths) {

    vector<vector<string>> ret;

    for(int i=; i<paths.size(); i++){

      process(paths[i]);

    }

    for(auto it = mp.begin(); it != mp.end(); it++){

      if(it->second.size() >= ){

        ret.push_back(it->second);

      }

    }

    return ret;

  }

};

LC 609. Find Duplicate File in System的更多相关文章

609. Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
【leetcode】609. Find Duplicate File in System
题目如下: Given a list of directory info including directory path, and all the files with contents in th ...
【LeetCode】609. Find Duplicate File in System 解题报告（Python & C++）
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录题目描述题目大意解题方法日期题目地址:https://leetcode.c ...
[LeetCode] Find Duplicate File in System 在系统中寻找重复文件
Given a list of directory info including directory path, and all the files with contents in this dir ...
[Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
LeetCode Find Duplicate File in System
原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/ 题目: Given a list of ...
[leetcode-609-Find Duplicate File in System]
https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...
Find Duplicate File in System
Given a list of directory info including directory path, and all the files with contents in this dir ...
HDU 3269 P2P File Sharing System（模拟）（2009 Asia Ningbo Regional Contest）
Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

python入坑级
pycharm设置 pycharm设置自动换行的方法只对当前文件有效的操作:菜单栏->View -> Active Editor -> Use Soft Wraps: 如果想对所有 ...
GNU编译器：Codesourcery
Codesourcery G++是个商业软件, 不过它有个lite版本,是完全免费的,只不过没有IDE,只有commmand line. Codesourcery G++支持coldfire, pow ...
数据库——Oracle(4)
1 Oracle中常用字符处理函数:用来处理char,varchar以及varchar2类型数据. 1)length(列名/字符串):统计当前该列的列值/字符串中字符的个数 select ename, ...
Codeforces Round #454 D. Power Tower （广义欧拉降幂）
D. Power Tower time limit per test 4.5 seconds memory limit per test 256 megabytes input standard in ...
MyBatis中<![CDATA[ ]]>的使用
原文地址:https://www.cnblogs.com/catgatp/p/6403382.html <![CDATA[]]>和转义字符被<![CDATA[]]>这个标记所 ...
JS 对浏览器相关的操作
// 获取浏览器宽高 var width = window.innerWidth || document.documentElement.clientWidth || document.body.c ...
一款强大的Visual Studio插件！CodeRush v19.1.9全新来袭
CodeRush是一个强大的Visual Studio® .NET 插件,它利用整合技术,通过促进开发者和团队效率来提升开发者体验.CodeRush能帮助你以极高的效率创建和维护源代码.Consume ...
阅读之web应用安全
一.三种坏人与servlet安全认证可以防止“假冒者”攻击,授权可以防止“非法升级者”攻击,机密性和数据完整性可以防止“窃听者”攻击. 二.认证与授权 Web容器进行认证与授权的过程: 客户端:浏览 ...
Java-Base64Fiend工具类
import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.File; impo ...
WTL自定义控件：edit输入响应
自定义的edit控件,获取edit输入响应的消息: 头文件部分: BEGIN_MSG_MAP(CCheckEditEx) MESSAGE_HANDLER(WM_CHAR, OnChar) END_MS ...

LC 609. Find Duplicate File in System

LC 609. Find Duplicate File in System的更多相关文章

随机推荐

热门专题