原题链接在这里:https://leetcode.com/problems/find-duplicate-file-in-system/description/

题目:

Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

  1. Imagine you are given a real file system, how will you search files? DFS or BFS?
  2. If the file content is very large (GB level), how will you modify your solution?
  3. If you can only read the file by 1kb each time, how will you modify your solution?
  4. What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
  5. How to make sure the duplicated files you find are not false positive?

题解:

用HashMap<String, List<String>> hm 来保存每个file 的content 和对应的path集合.

每个input string 按照 path fileName1(content1) fileName2(content2) 格式输入. 所以先按照空格断开,后面的都是文件名加上内容,再用"("断开提取内容.

最后看内容对应文件数大于1的就是有duplicate.

Time Complexity: O(paths.length * x). x为input string的平均长度.

Space:O(paths.length * x). hm size.

AC Java:

 class Solution {
public List<List<String>> findDuplicate(String[] paths) {
List<List<String>> res = new ArrayList<List<String>>();
HashMap<String, List<String>> hm = new HashMap<String, List<String>>(); for(String path : paths){
String [] pathArr = path.split("\\s+");
for(int i = 1; i<pathArr.length; i++){
String content = pathArr[i].substring(pathArr[i].indexOf("("));
String fileName = pathArr[i].substring(0, pathArr[i].indexOf("("));
List<String> list = hm.getOrDefault(content, new ArrayList<String>());
list.add(pathArr[0] + "/" + fileName);
hm.put(content, list);
}
} for(String key : hm.keySet()){
if(hm.get(key).size() > 1){
res.add(hm.get(key));
}
} return res;
}
}

LeetCode Find Duplicate File in System的更多相关文章

  1. [LeetCode] Find Duplicate File in System 在系统中寻找重复文件

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  2. LC 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  3. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  4. 【leetcode】609. Find Duplicate File in System

    题目如下: Given a list of directory info including directory path, and all the files with contents in th ...

  5. [Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  6. [leetcode-609-Find Duplicate File in System]

    https://discuss.leetcode.com/topic/91430/c-clean-solution-answers-to-follow-upGiven a list of direct ...

  7. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  8. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  9. HDU 3269 P2P File Sharing System(模拟)(2009 Asia Ningbo Regional Contest)

    Problem Description Peer-to-peer(P2P) computing technology has been widely used on the Internet to e ...

随机推荐

  1. Openstack之Nova创建虚机流程分析

    前言        Openstack作为一个虚拟机管理平台,核心功能自然是虚拟机的生命周期的管理,而负责虚机管理的模块就是Nova. 本文就是openstack中Nova模块的分析,所以本文重点是以 ...

  2. Samba 3.6.9 安装、管理

    Samba简介 Samba服务类似于windows上的共享功能,可以实现linux上共享文件,windows上访问,当然在linux上可以访问到.是一种在局域网上共享文件和打印机的一种通信协议,它为局 ...

  3. Windows Server 2008 R2 FTP无法从外部访问的解决方法

    在Windows Server 2008 R2中配置好FTP服务器后,可以在本机访问,但是无法从另一台电脑访问.原因就是在于防火墙没有配置好. 1.首先检查服务器管理器中的入站规则,确保已启用FTP服 ...

  4. Centos6.5安装python2.7与pip

    安装Python2.7 安装环境 [root@localhost1 ~]# cat /etc/redhat-release CentOS release 6.5 (Final) [root@local ...

  5. SpringBoot 悲观锁 与 乐观锁

    乐观所和悲观锁策略 悲观锁:在读取数据时锁住那几行,其他对这几行的更新需要等到悲观锁结束时才能继续 . 乐观所:读取数据时不锁,更新时检查是否数据已经被更新过,如果是则取消当前更新,一般在悲观锁的等待 ...

  6. camera corder profile

    /system/etc/ 其中的qulity high 必须与 最大的支持的分辨率相同. 不然cts 不过. 这里的配置必须在报告给app的数据匹配.

  7. mysql全库搜索指定字符串

    mysql全库搜索指定字符串 DELIMITER // DROP PROCEDURE IF EXISTS `proc_FindStrInAllDataBase`; # CALL `proc_FindS ...

  8. 使用Xcode IDE写node.js

    最近在玩node.js 但是发现很多IDE就是用不顺手 后来发现Xcode可以剖析java script 于是试着使用Xcode来当做node.js的编辑器 首先,在Mac上必须先安装node.js的 ...

  9. JNI_Z_02_函数参数_JNIEnv*_jclass_jobject

    1. 1.1.JNIEXPORT void JNICALL Java_包名_类名_函数名01(JNIEnv * env, jclass clazz) // Java代码中的 静态函数 1.2.JNIE ...

  10. css 中的background:transparent到底是什么意思有什么作用

    有时我在看css时,看到有的css属性定义为background:transparent.意思就是背景透明.实际上background默认的颜色就是透明的属性.所以写和不写都是一样的 有段时间没写文章 ...