Repeated DNA Sequences 解答
Question
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Solution -- Bit Manipulation
Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.
If we represent DNA substring by integer, the space is cut down to (4n).
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<String>();
int len = s.length();
if (len < 10) {
return result;
}
Map<Character, Integer> map = new HashMap<Character, Integer>();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3);
Set<Integer> temp = new HashSet<Integer>();
Set<Integer> added = new HashSet<Integer>();
int hash = 0;
for (int i = 0; i < len; i++) {
if (i < 9) {
//each ACGT fit 2 bits, so left shift 2
hash = (hash << 2) + map.get(s.charAt(i));
} else {
hash = (hash << 2) + map.get(s.charAt(i));
//make length of hash to be 20
hash = hash & (1 << 20) - 1;
if (temp.contains(hash) && !added.contains(hash)) {
result.add(s.substring(i - 9, i + 1));
added.add(hash); //track added
} else {
temp.add(hash);
}
}
}
return result;
}
Repeated DNA Sequences 解答的更多相关文章
- lc面试准备:Repeated DNA Sequences
1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...
- LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)
187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [Leetcode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- 【leetcode】Repeated DNA Sequences(middle)★
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- LeetCode() Repeated DNA Sequences 看的非常的过瘾!
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
随机推荐
- HDOJ-1019 Least Common Multiple
http://acm.hdu.edu.cn/showproblem.php?pid=1019 题意:给出n个数,求它们的最小公倍数 对于n个数,它们的最小公倍数等于[前n-1个数的最小公倍数和第n个数 ...
- 简单的Dao设计模式
简单的DAO设计模式 这两天学习到了DAO(Data Access Object 数据存取对象)设计模式.想谈谈自己的感受,刚开始接触是感觉有点难,觉得自己逻辑理不清,主要是以前学的知识比较零散没有很 ...
- hdu 2846
字典树的变形,常规字典树用来求前缀的,所以把每个单词拆成len个词建树,为了避免abab这样的查ab时会出现两次,每次加一个标记,如果该节点上次的建树的单词与本次相同就不更新,否则更新 #includ ...
- TCP/IP系列——长连接与短连接的区别
1 什么是长连接和短连接 三次握手和四次挥手 TCP区别于UDP最重要的特点是TCP必须建立在可靠的连接之上,连接的建立和释放就是握手和挥手的过程. 三次握手为连接的建立过程,握手失败 ...
- 网络基础知识HTTP(1) --转载
为什么要写网络? 作为网站开发人员,你所开发的软件产品最终是要在网络上运行的.这就像一个生产商,要生产供给东北地区的产品,而生产商对东北的天气.地理.人文毫无了解.生产商的产品肯定是不可用的,或者低端 ...
- Js Json 互转
推荐: //js对象转换为 JSON 文本 var text = '[{"id":1,"name":"C","size" ...
- CSS---------------之文本类型
通过font来改变文本,主要从以下几个方面 字体加粗,字体的风格:斜体和字体变形:小型大写字母 字体的大小 行高 字体 示例如下 p{font:italic 75%/125% "Comic ...
- Lambda表达式的面纱(一)
在.NET3.0版本中微软推出了Lambda表达式.这使代码的表述可以更加优雅.但是对于新事物大多会本能的排斥,虽然3.0版本已经推出了好久了,但是我向周围的人了解了一下,用Lambda的人不是很多, ...
- myeclipse8.6 for spring环境配置
- hdu1597
Problem Description 假设: S1 = 1 S2 = 12 S3 = 123 S4 = 1234 ......... S9 = 123456789 S10 = 1234567891 ...