Problem:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Analysis:

This problem has a genius solution.
If you have not encounter it before, you may never be able to solve it out. Idea:
Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code) Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation. 1. prepare the HashMap for the mapping. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); 2. move the subsequence window, and get realted Hashcode.
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
... }
}
Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
2.1 get 20 bits '1'.
((1 << 20) - 1)
The idea is not hard: like 4 - 1 = 100 - 1 = 011
2.2 use '&'' operator to get the bits.
hash = hash & ((1 << 20) - 1); Errors:
When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);

Solution:

public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (counted.containsKey(hash) && counted.get(hash) == 1) {
ret.add(s.substring(i-9, i+1));
counted.put(hash, 2);
} else{
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
}
}
}
return ret;
}
}
Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashSet<Integer> appeared = new HashSet<Integer> ();
HashSet<Integer> counted = new HashSet<Integer> (); int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (appeared.contains(hash) && !counted.contains(hash)) {
ret.add(s.substring(i-9, i+1));
counted.add(hash);
} else{
appeared.add(hash);
}
}
}
return ret;
}
}

[LeetCode#187]Repeated DNA Sequences的更多相关文章

  1. [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  3. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. [LeetCode] 187. Repeated DNA Sequences 解题思路

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串

    很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...

  6. 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...

  7. 【LeetCode】187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  8. 187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  9. Leetcode:Repeated DNA Sequences详细题解

    题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

随机推荐

  1. Linux文件/目录权限整理

  2. div随另一个div自动增高

    <script type="text/javascript"> document.getElementById("div1").style.heig ...

  3. Android 5.0 全新的动画

    触摸反馈 ripple 触摸反馈是指用户在触摸控件时的一种可视化交互,在Android L之前,通常是通过press色变来凸显,但是因为是瞬间变化的效果,不如动画生动. 在Android L 中定义了 ...

  4. 学习java随笔第七篇:java的类与对象

    类 同一个包(同一个目录),类的创建与调用 class Man{ String name; void GetMyName() { System.out.println(name); } } publi ...

  5. 15第十五章UDF用户自定义函数(转载)

    15第十五章UDF用户自定义函数 待补上 原文链接 本文由豆约翰博客备份专家远程一键发布

  6. oracle set命令

    SQL>set colsep' ';     //-域输出分隔符SQL>set echo off;     //显示start启动的脚本中的每个sql命令,缺省为onSQL> set ...

  7. jQuery HTML CSS 方法

    jQuery HTML / CSS 方法 下面的表格列出了所有用于处理 HTML 和 CSS 的 jQuery 方法. 下面的方法适用于 HTML 和 XML 文档.除了:html() 方法. 方法 ...

  8. ListPreference之entries和entryValues

    在使用PreferenceActivity时,碰到配置文件的ListPreference有两个属性android:entries,android:entryValues.这两个属性其实就和html的o ...

  9. ios专题 - 多线程非GCD(1)

    iOS多线程初体验是本文要介绍的内容,iPhone中的线程应用并不是无节制的,官方给出的资料显示iPhone OS下的主线程的堆栈大小是1M,第二个线程开始都是512KB.并且该值不能通过编译器开关或 ...

  10. struts-json

    Struts2序列化的属性,该属性在action中必须有对应的getter方法 如果action的属性很多,我们想要从Action返回到调用页面的数据.这个时候配置includeProperties或 ...