[LeetCode#187]Repeated DNA Sequences
Problem:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. For example, Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Analysis:
This problem has a genius solution.
If you have not encounter it before, you may never be able to solve it out. Idea:
Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code) Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation. 1. prepare the HashMap for the mapping. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); 2. move the subsequence window, and get realted Hashcode.
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
... }
}
Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
2.1 get 20 bits '1'.
((1 << 20) - 1)
The idea is not hard: like 4 - 1 = 100 - 1 = 011
2.2 use '&'' operator to get the bits.
hash = hash & ((1 << 20) - 1); Errors:
When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
Solution:
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (counted.containsKey(hash) && counted.get(hash) == 1) {
ret.add(s.substring(i-9, i+1));
counted.put(hash, 2);
} else{
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
}
}
}
return ret;
}
}
Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashSet<Integer> appeared = new HashSet<Integer> ();
HashSet<Integer> counted = new HashSet<Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (appeared.contains(hash) && !counted.contains(hash)) {
ret.add(s.substring(i-9, i+1));
counted.add(hash);
} else{
appeared.add(hash);
}
}
}
return ret;
}
}
[LeetCode#187]Repeated DNA Sequences的更多相关文章
- [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [LeetCode] 187. Repeated DNA Sequences 解题思路
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串
很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...
- 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...
- 【LeetCode】187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- 187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- Leetcode:Repeated DNA Sequences详细题解
题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
随机推荐
- Linux自动化运维部署+运维
自动化部署及配置(Cobbler/Kickstart) 红帽发布的网络安装服务器套件 Cobbler可以说是一大Linux装机利器,可以快速的建立网络安装环境,据说比Kickstart还要好用. 分布 ...
- javascript异步加载详解(转)
本文总结一下浏览器在 javascript 的加载方式. 关键词:异步加载(async loading),延迟加载(lazy loading),延迟执行(lazy execution),async 属 ...
- [上传下载] C# ImageUpload图片上传类教程与源码下载 (转载)
点击下载 ImageUpload.zip 功能如下图片1.设置属性后上传图片,用法如下 /// <summary> /// 图片上传类 /// </summary> //--- ...
- C#结构内存布局介绍
转载:http://www.csharpwin.com/csharpspace/10455r2800.shtml 本来打算写一篇文章,详细地讨论一下结构的内存布局,但是想了下,跟路西菲尔的这篇文章也差 ...
- Android Studio美化之优雅的logcat
博客: 安卓之家 微博: 追风917 CSDN: 蒋朋的家 简书: 追风917 博客园: 追风917 先来个图,图样吐sexy: 很简单,跟我走吧,两步: 1. 引入Logger库 首先,这个sexy ...
- Unix时间戳 和 NSDate 的转换
一个时间戳字符串:NSString *timeStampStr = @"1441602721"; 转换成时间 double unixTimeStamp ...
- Bresenham画直线,任意斜率
function DrawLineBresenham(x1,y1,x2,y2) %sort by x,sure x1<x2. if x1>x2 tmp=x1; x1=x2; x2=tmp; ...
- Qt5下的常见问题————C1083
很多像我一样刚开始学习Qt的时候都会遇到这样的问题.例如"fatal error C1083: 无法打开包括文件:“QApplication”: No such file or direct ...
- Strategy 模式
可以看到 Strategy 模式和 Template 模式解决了类似的问题,也正如在 Template 模式中分析的,Strategy模式和 Template 模式实际是实现一个抽象接口的两种方式:继 ...
- 网上流行的add(2)(3)(4)
网上有很多其他的各样的算法.其实这题就可以用javascript属性arguments.callee来实现,代码如下: function add(x){ var result=0; return fu ...