[LeetCode#187]Repeated DNA Sequences
Problem:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. For example, Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Analysis:
This problem has a genius solution.
If you have not encounter it before, you may never be able to solve it out. Idea:
Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code) Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation. 1. prepare the HashMap for the mapping. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); 2. move the subsequence window, and get realted Hashcode.
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
... }
}
Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
2.1 get 20 bits '1'.
((1 << 20) - 1)
The idea is not hard: like 4 - 1 = 100 - 1 = 011
2.2 use '&'' operator to get the bits.
hash = hash & ((1 << 20) - 1); Errors:
When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
Solution:
public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        ArrayList<String> ret = new ArrayList<String> ();
        if (s.length() < 10)
            return ret;
        HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);
        HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
        int hash = 0;
        for (int i = 0; i < s.length(); i++) {
            if (i < 9) {
                hash = (hash << 2) + map.get(s.charAt(i));
            } else{
                hash = (hash << 2) + map.get(s.charAt(i));
                hash = hash & ((1 << 20) - 1);
                if (counted.containsKey(hash) && counted.get(hash) == 1) {
                    ret.add(s.substring(i-9, i+1));
                    counted.put(hash, 2);
                } else{
                    if (counted.containsKey(hash))
                        counted.put(hash, counted.get(hash)+1);
                    else
                        counted.put(hash, 1);
                }
            }
        }
        return ret;
    }
}
Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
public class Solution {
    public List<String> findRepeatedDnaSequences(String s) {
        ArrayList<String> ret = new ArrayList<String> ();
        if (s.length() < 10)
            return ret;
        HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
        map.put('A', 0);
        map.put('C', 1);
        map.put('G', 2);
        map.put('T', 3);
        HashSet<Integer> appeared = new HashSet<Integer> ();
        HashSet<Integer> counted = new HashSet<Integer> ();
        int hash = 0;
        for (int i = 0; i < s.length(); i++) {
            if (i < 9) {
                hash = (hash << 2) + map.get(s.charAt(i));
            } else{
                hash = (hash << 2) + map.get(s.charAt(i));
                hash = hash & ((1 << 20) - 1);
                if (appeared.contains(hash) && !counted.contains(hash)) {
                    ret.add(s.substring(i-9, i+1));
                    counted.add(hash);
                } else{
                    appeared.add(hash);
                }
            }
        }
        return ret;
    }
}
[LeetCode#187]Repeated DNA Sequences的更多相关文章
- [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列
		All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ... 
- leetcode   187. Repeated DNA Sequences  求重复的DNA串  ----------  java
		All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ... 
- Java for LeetCode 187 Repeated DNA Sequences
		All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ... 
- [LeetCode] 187. Repeated DNA Sequences 解题思路
		All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ... 
- [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串
		很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ... 
- 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)
		作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ... 
- 【LeetCode】187. Repeated DNA Sequences
		题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ... 
- 187. Repeated DNA Sequences
		题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ... 
- Leetcode:Repeated DNA Sequences详细题解
		题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ... 
随机推荐
- NLog 安装使用
			1:安装 Install-Package NLog.Config 或 通过Nuget 2:Log levels Trace 非常详细的信息,一般在开发时使用. Debug 比Trace稍微少一点一般不 ... 
- [XML] C#ResourceManagerWrapper帮助类 (转载)
			点击下载 ResourceManagerWrapper.rar /// <summary> /// 类说明:ResourceManagerWrapper /// 编 码 人:苏飞 /// ... 
- onTextChanged参数解释及实现EditText字数监听
			http://www.picksomething.cn/?p=34 由于最近做项目要检测EditText中输入的字数长度,从而接触到了Android中EditText的监听接口,TextWatcher ... 
- Eclipse闪退/打不开/无法启动/一闪而过
			转自:http://my.oschina.net/psuyun/blog/421058 很长时间了,写java.写android都是用的Eclipse.可是突然有一天,当我像往常一样试图打开Eclip ... 
- javascript类继承系列四(组合继承)
			原理: 结合了原型链和对象伪装各自优点的方式,基本思路是:使用原型链继承原型上的属性和方法,使用对象伪装继承实例属性,通过定义原型方法,允许函数复用,并运行每个实例拥有自己的属性 function B ... 
- ios专题 - sandbox机制
			[原创]http://www.cnblogs.com/luoguoqiang1985 ios在安装APP时,把APP的偏好设置与数据放在sandbox里.sandbox通过一系列细颗粒度控制APP访问 ... 
- Linux 特殊权限位
			特殊权限位 LINUX 基本权限有9位但是还有三位特殊权限. suid s(有x权限) S(没有x权限) 4 在用户权限的第三位 sgid s(有x权限) S(没有x权限) 2 在用户组权限的第三位 ... 
- sql 建立数据库,表格,索引,主键
			---- 数据库: `message_db`-- -- --------------------------------------------------------create database ... 
- C#程序中:如何删除xml文件中的节点、元素。
			C#中动态的清理xml文件中的垃圾信息是程序员必会的哦.这就像数据库一样,不会清理数据怎么可以呢?其实xml文件就可以用作一个小的数据库,存储一些简单的信息.所以,用C#程序实现xml文件的增.删.改 ... 
- 读取XML文件节点数据
			xml测试文件为 <?xml version="1.0" standalone="yes"?> <NewDataSet> <xs: ... 
