[LeetCode#187]Repeated DNA Sequences
Problem:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA. Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule. For example, Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return:
["AAAAACCCCC", "CCCCCAAAAA"].
Analysis:
This problem has a genius solution.
If you have not encounter it before, you may never be able to solve it out. Idea:
Since we only have four characters "A", "C", "G", "T", We can map each character with a sole 2 bits. (Note: not the ASCII code)
And each sub sequence is 10 characters long, after mapping, which would only take up 20 bits. (Since an Integer in Java takes up 32 bits, a subsequence could be represented into an Integer, or we call this as an Integer hash code) Another benefits of this mapping is that, as long we add new character, we can update on related hash code through bit movement operation. 1. prepare the HashMap for the mapping. HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); 2. move the subsequence window, and get realted Hashcode.
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
... }
}
Note: once the slide window's size meet 10 characters, we should get the hash code for the window. The skill here is to use '&' with a 20 bits "1" to get those bits.
2.1 get 20 bits '1'.
((1 << 20) - 1)
The idea is not hard: like 4 - 1 = 100 - 1 = 011
2.2 use '&'' operator to get the bits.
hash = hash & ((1 << 20) - 1); Errors:
When you put a <key, value> pair into hashmap, and the value based on the existing in the HashMap, you must test if the pair exist or not.
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
Solution:
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashMap<Integer, Integer> counted = new HashMap<Integer, Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (counted.containsKey(hash) && counted.get(hash) == 1) {
ret.add(s.substring(i-9, i+1));
counted.put(hash, 2);
} else{
if (counted.containsKey(hash))
counted.put(hash, counted.get(hash)+1);
else
counted.put(hash, 1);
}
}
}
return ret;
}
}
Actually, since we only care about if a subsequence has appeared twice, we could use two HashSet to avoid the above ugly code.
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String> ();
if (s.length() < 10)
return ret;
HashMap<Character, Integer> map = new HashMap<Character, Integer> ();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
HashSet<Integer> appeared = new HashSet<Integer> ();
HashSet<Integer> counted = new HashSet<Integer> ();
int hash = 0;
for (int i = 0; i < s.length(); i++) {
if (i < 9) {
hash = (hash << 2) + map.get(s.charAt(i));
} else{
hash = (hash << 2) + map.get(s.charAt(i));
hash = hash & ((1 << 20) - 1);
if (appeared.contains(hash) && !counted.contains(hash)) {
ret.add(s.substring(i-9, i+1));
counted.add(hash);
} else{
appeared.add(hash);
}
}
}
return ret;
}
}
[LeetCode#187]Repeated DNA Sequences的更多相关文章
- [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [LeetCode] 187. Repeated DNA Sequences 解题思路
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串
很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...
- 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...
- 【LeetCode】187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- 187. Repeated DNA Sequences
题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
- Leetcode:Repeated DNA Sequences详细题解
题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...
随机推荐
- Linux Apache SVN
yum install mod_dav_svn subversion httpd mkdir /var/www/svnsvnadmin create /var/www/svn/puppetcd /v ...
- C#面向对象(二)
一:抽象方法 1. 在面向对象编程语言中抽象方法指一些只有方法声明,而没有具体方法体的方法.抽象方法一般存在于抽象类或接口中. 在一些父类中,某些行为不是非常明确,因此无法用代码来具体实现,但是类还必 ...
- HTML+CSS基础学习笔记(8)
一.水平居中设置--行内元素 如果设置元素为文本.图片等行内元素时,水平居中是通过给父元素设置text-align:center来实现的 二.水平居中设置 --定宽块状元素 #当被设置元素为块状元素时 ...
- PGsql解决时差24H
SELECT sa_ed_time, sa_st_time, case when sa_ed_time > sa_st_time then extract(EPOCH FROM (sa_ed_t ...
- Eclipse项目 迁移到 Intellj IDEA
自从用了Intellj IDEA,很多项目都想迁移到Intellj上面去开发 鉴于我们的大部分项目都是基于Maven构建的,所以就可以利用maven的命令来做这个事情. 1.选择一个ecli ...
- SqlSugar-事务操作
一.事务操作实例 特别说明: 1.特别说明:在事务中,默认情况下是使用锁的,也就是说在当前事务没有结束前,其他的任何查询都需要等待 2.ReadCommitted:在正在读取数据时保持共享锁,以避免脏 ...
- LINQ 多条件写法
源代码: string depAll = (ddl_dep1.SelectedValue == "") ? "" : ddl_dep1.SelectedValu ...
- 对select into表复制的一点思考
操作系统:Windows 2007 数据库版本:SQL Server 2008 R2 今天写存储过程遇到一个问题,用"Select 1 id,'Boss_he' into A"这样 ...
- Java SpringMvc+hibernate架构中,调用Oracle中的sp,传递数组参数
一.问题 我们调用数据,大都是可以直接获取表中的数据,或者用复杂点的sql语句组成的.但是,有时候,当这样达不到我们要的全部数据的时候,这时,我们就用到了存储过程[sp],如果sp需要参数是数组的话, ...
- JS DOM 来控制HTML元素
JS DOM 来控制HTML元素 (ps:这个有很多方法,挑一些详解,嘻嘻) 1.getElementsByName():获取name. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ...