All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

Example:

Input: s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output: ["AAAAACCCCC", "CCCCCAAAAA"]

Solution: count the frequency of 10 letter words

class Solution {
//find all the 10-letter-long sequences that occur more than once in a DNA molecule
public List<String> findRepeatedDnaSequences(String s) {
//substring -- subset n +n-1+...+1: n-k+1
List<String> res = new ArrayList<String>();
Map<String,Integer> map = new HashMap<String,Integer>();
int n = s.length();
int k =10;
if(n < k) return res;
for(int i = 0; i<=n-k; i++){//11-10 1
String sub = s.substring(i, i+k);
if(map.containsKey(sub)){
map.put(sub, map.get(sub)+1);
}else {
map.put(sub, 1);
}
}
for(Map.Entry<String, Integer> entry : map.entrySet()){
if(entry.getValue() >1){
res.add(entry.getKey());
}
}
return res;
}
}

Solution 2: two HashSet with a non-duplicate feature.

public List<String> findRepeatedDnaSequences(String s) {
Set seen = new HashSet(), repeated = new HashSet();
for (int i = 0; i + 9 < s.length(); i++) {
String ten = s.substring(i, i + 10);
if (!seen.add(ten))//if add then first time, else add it
repeated.add(ten);
}
return new ArrayList(repeated);
}

subsequence & substring

subsequence: subset 2^n

substring: continous string : n+n-1+n-2+...+1

*187. Repeated DNA Sequences (hashmap, one for loop)(difference between subsequence & substring)的更多相关文章

  1. [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  3. 187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  4. [LeetCode#187]Repeated DNA Sequences

    Problem: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: ...

  5. 【LeetCode】187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  6. 187. Repeated DNA Sequences重复的DNA子串序列

    [抄题]: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &qu ...

  7. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. [LeetCode] 187. Repeated DNA Sequences 解题思路

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  9. 187. Repeated DNA Sequences (String; Bit)

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. 三个键print scroll、pause

    上班族或是办公室白领每天都几乎跟键盘打交道, 那么键盘上的PrtSc SysRq(print screen).Scroll Lock.se Break(pause break).numlock等有何作 ...

  2. java TopK算法

    现有一亿个数据,要求从其中找出最小的一万个数,希望所需的时间和空间最小,也就是所谓的topK问题 TopK问题就是从海量的数据中取最大(或最小的)的K个数. TopK问题其实是有线性时间复杂度的解的, ...

  3. Spyder清除Variable Explorer&&手动安装protobuf3.0(为了配置windows的python接口)

    输入:reset 选择:y PS:建议在windows下,安装anaconda32bit版本的,可以兼容更多第三方包.   Conda使用清华镜像 配置镜像 在conda安装好之后,默认的镜像是官方的 ...

  4. docker 镜像保存为文件及从文件导入镜像的方法

    1.保存镜像为文件 docker save -o 要保存的文件名 要保存的镜像 举例: docker save -o 2.从文件载入镜像 docker load --input 文件或者docker ...

  5. 终极版clearFix——支持IE6+

    /*兼容IE6.7*/ /*这段代码非常暴力,from internet,墙裂推荐*/ .clearFix:before,.clearFix:after{ content:""; ...

  6. JS中==、===和Object.is()的区别

    JS中==.===和Object.is()的区别 首先,先粗略了解一下这三个玩意儿: ==:等同,比较运算符,两边值类型不同的时候,先进行类型转换,再比较: ===:恒等,严格比较运算符,不做类型转换 ...

  7. 关于python的sort和sorted

    1.sort无返回值,没有新建列表  例子: a=[2,1,3] print("a=",a) b=a.sort() print("a=",a) print(&q ...

  8. python dataframe drop_duplicates用法技巧去重

    data.drop_duplicates()#data中一行元素全部相同时才去除 data.drop_duplicates(['a','b'])#data根据’a','b'组合列删除重复项,默认保留第 ...

  9. Tree UVA - 548(二叉树递归遍历)

    题目链接:https://vjudge.net/problem/UVA-548 题目大意:给一颗点带权(权值各不相同,都是小于10000的正整数)的二叉树的中序遍历和后序遍历,找一个叶子结点使得它到根 ...

  10. suffix ACM-ICPC 2017 Asia Qingdao

    Consider n given non-empty strings denoted by s1 , s2 , · · · , sn . Now for each of them, you need ...