题目如下:

Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txtf2.txt ... fn.txt with content f1_contentf2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

  1. No order is required for the final output.
  2. You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
  3. The number of files given is in the range of [1,20000].
  4. You may assume no files or directories share the same name in the same directory.
  5. You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

解题思路:我的方法非常简单,解析输入的字符串,把文件内容作为key,文件路径作为value存入字典中,当然value是用list保存的。最后遍历字典,value超过两个元素即为符合条件的结果。

代码如下:

class Solution(object):
def findDuplicate(self, paths):
"""
:type paths: List[str]
:rtype: List[List[str]]
"""
dic = {}
for p in paths:
item = p.split(' ')
for i in range(1,len(item)):
file_name = item[i].split('(')[0]
file_content = item[i].split('(')[1][:-1]
dic[file_content] = dic.setdefault(file_content,[]) + [item[0] + '/' + file_name]
res = []
for v in dic.itervalues():
if len(v) >= 2:
res.append(v)
return res

【leetcode】609. Find Duplicate File in System的更多相关文章

  1. 【LeetCode】609. Find Duplicate File in System 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  2. 【LeetCode】652. Find Duplicate Subtrees 解题报告(Python)

    [LeetCode]652. Find Duplicate Subtrees 解题报告(Python) 标签(空格分隔): LeetCode 作者: 负雪明烛 id: fuxuemingzhu 个人博 ...

  3. LC 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  4. 609. Find Duplicate File in System

    Given a list of directory info including directory path, and all the files with contents in this dir ...

  5. 【LeetCode】316. Remove Duplicate Letters 解题报告(Python & C++)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述 题目大意 解题方法 日期 题目地址:https://leetcode.c ...

  6. 【LeetCode】388. Longest Absolute File Path 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 目录 题目描述: 题目大意 解题方法 日期 题目地址:https://leetcode. ...

  7. 【LeetCode】217. Contains Duplicate (2 solutions)

    Contains Duplicate Given an array of integers, find if the array contains any duplicates. Your funct ...

  8. 【leetcode】316. Remove Duplicate Letters

    题目如下: Given a string which contains only lowercase letters, remove duplicate letters so that every l ...

  9. 【leetcode】388. Longest Absolute File Path

    题目如下: Suppose we abstract our file system by a string in the following manner: The string "dir\ ...

随机推荐

  1. margin/padding百分比值的计算

    1.百分比介绍 一般元素的宽度用百分比值表示时,元素的总宽度包括外边距取决于父元素的width,这样可能得到"流式"页面,即元素的外边距会扩大或缩小以适应父元素的实际大小.如果对这 ...

  2. python发送消息到ipmsg

    from socket import * #利用socket模块生成套接字s = socket(AF_INET,SOCK_DGRAM) #定义一个元组,包含ip地址,和端口号,ip地址必须为字符串,端 ...

  3. Tomcat GC参数详解

    tomcat启动参数,将JVM GC信息写入tomcat_gc.log CATALINA_OPTS='-Xms512m -Xmx4096m -XX:PermSize=64M -XX:MaxNewSiz ...

  4. Embedding理解与代码实现

    https://blog.csdn.net/songyunli1111/article/details/85100616

  5. openstack介绍及共享组件——消息队列rabbitmq

    一.云计算的前世今生 所有的新事物都不是突然冒出来的,都有前世和今生.云计算也是IT技术不断发展的产物. 要理解云计算,需要对IT系统架构的发展过程有所认识. 请看下 IT系统架构的发展到目前为止大致 ...

  6. symfony 初始化项目

    学习Symfony首先看一下已经发布了哪些版本; 现在我记录一下两个版本的使用情况: 3.4 是一个长期维护且稳定的版本 4.3是一个最新版本且速度飞快地版本 官方介绍:https://symfony ...

  7. ping, telnet, tcping 命令使用及对比

    1. ping 命令 ping 命令只能检查 IP 的连通性或网络连接速度,无法具体到某个端口. ping 命令使用 ICMP 协议,跟 IP 协议属于同一层次(网络层).ping 命令在每次发数据包 ...

  8. hdu6699Block Breaker

    Problem Description Given a rectangle frame of size n×m. Initially, the frame is strewn with n×m squ ...

  9. 记一次Laravel 定时任务schedul:run未执行的处理

    关于Laravel的任务调度(定时任务)的配置在此不做赘述,跟着官方文档一步一步的操作是不会导致定时任务不能正常工作的. 为保证能及时捕获定时任务执行出现异常的原因,只需在配置系统crontab时指定 ...

  10. UI自动化通过文字、父子元素,兄弟元素定位

    在百度首页,通过文字,父子元素,兄弟元素进行定位 一.文字定位: 通过界面上的文字进行定位,注意如果同一个页面上存在多个同样的文字的情况,返回的值会是多个,建议只存在一个情况下才用这方法. 如:定位百 ...