[LeetCode#187]Repeated DNA Sequences
Problem:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. For example, Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Analysis:
This problem has a genius solution.
If you have not encounter it before, you may never be able to solve it out. Idea:
Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code) Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation. 1. prepare the HashMap for the mapping. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); 2. move the subsequence window, and get realted Hashcode.
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
... }
}
Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
2.1 get 20 bits '1'.
((1 << 20) - 1)
The idea is not hard: like 4 - 1 = 100 - 1 = 011
2.2 use '&'' operator to get the bits.
hash = hash & ((1 << 20) - 1); Errors:
When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
Solution:
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (counted.containsKey(hash) && counted.get(hash) == 1) {
ret.add(s.substring(i-9, i+1));
counted.put(hash, 2);
} else{
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
}
}
}
return ret;
}
}
Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashSet<Integer> appeared = new HashSet<Integer> ();
HashSet<Integer> counted = new HashSet<Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (appeared.contains(hash) && !counted.contains(hash)) {
ret.add(s.substring(i-9, i+1));
counted.add(hash);
} else{
appeared.add(hash);
}
}
}
return ret;
}
}
[LeetCode#187]Repeated DNA Sequences的更多相关文章
- [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [LeetCode] 187. Repeated DNA Sequences 解题思路
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串
很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...
- 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...
- 【LeetCode】187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- 187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- Leetcode:Repeated DNA Sequences详细题解
题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
随机推荐
- Unity3D获取当前键盘按键
获取当前键盘按键,代码如下: using UnityEngine; using System.Collections; public class GetCurrentKey : MonoBehavio ...
- Gym 100187A-Potion of Immortality
题意:有n个药瓶,里面有一个有毒,然后每次拿兔子去试吃k瓶并且只能是k瓶,如果兔子死了就知道那瓶是毒药了,现在问你最少兔子要试吃几次. 分析:这题卡了好久,其实很简单.先考虑肯定要吃n/k次,那么剩下 ...
- html不同文档类型支持的元素标签
- python基础知识二
对象 python把在程序中用到的任何东西都成为对象. 每一个东西包括数.字符串甚至函数都是对象. 使用变量时只需要给他们赋一个值.不需要声明或定义数据类型. 逻辑行与物理行 物理行是你在编写程序时所 ...
- 通过调整表union all的顺序优化SQL
操作系统:Windows XP 数据库版本:SQL Server 2005 今天遇到一个SQL,过滤条件是自动生成的,因此,没法通过调整SQL的谓词达到优化的目的,只能去找SQL中的“大表”.有一个视 ...
- jquery”ScriptResourceMapping
要“jquery”ScriptResourceMapping.请添加一个名为 jquery (区分大小写)的 ScriptResourceMapping.”的解决办法. 1.先将aspnet.scri ...
- ecmall数据库基本操作
ecmall数据库基本操作,为了认真研究ecmall二次开发,我们必须熟悉ecamll的数据库结构,ecmall数据库结构研究熟悉之后,才能去认真分析ecamll的程序结构.从而实现ecmall二次开 ...
- MVVM模式应用 之xml文件的读取
XML如下所示: <?xml version="1.0" encoding="utf-8" ?> <schools> <schoo ...
- Git 基本使用配置
// 1.配置用户名邮箱:用于记录你个人的用户名称和电子邮件地址,用户名可随意修改,git 用于记录是谁提交了更新,以及更新人的联系方式: $ git config --global user.nam ...
- 按钮效果 css
<!doctype html><html lang="en"><head> <meta charset="UTF-8" ...