Question

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].

Solution -- Bit Manipulation

Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.

If we represent DNA substring by integer, the space is cut down to  (4n).

 public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<String>(); int len = s.length();
if (len < 10) {
return result;
} Map<Character, Integer> map = new HashMap<Character, Integer>();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); Set<Integer> temp = new HashSet<Integer>();
Set<Integer> added = new HashSet<Integer>(); int hash = 0;
for (int i = 0; i < len; i++) {
if (i < 9) {
//each ACGT fit 2 bits, so left shift 2
hash = (hash << 2) + map.get(s.charAt(i));
} else {
hash = (hash << 2) + map.get(s.charAt(i));
//make length of hash to be 20
hash = hash & (1 << 20) - 1; if (temp.contains(hash) && !added.contains(hash)) {
result.add(s.substring(i - 9, i + 1));
added.add(hash); //track added
} else {
temp.add(hash);
}
} } return result;
}

Repeated DNA Sequences 解答的更多相关文章

  1. lc面试准备:Repeated DNA Sequences

    1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  2. LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)

    187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...

  3. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. [Leetcode] Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  6. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  7. LeetCode() Repeated DNA Sequences 看的非常的过瘾!

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  9. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. HDOJ-1019 Least Common Multiple

    http://acm.hdu.edu.cn/showproblem.php?pid=1019 题意:给出n个数,求它们的最小公倍数 对于n个数,它们的最小公倍数等于[前n-1个数的最小公倍数和第n个数 ...

  2. 简单的Dao设计模式

    简单的DAO设计模式 这两天学习到了DAO(Data Access Object 数据存取对象)设计模式.想谈谈自己的感受,刚开始接触是感觉有点难,觉得自己逻辑理不清,主要是以前学的知识比较零散没有很 ...

  3. hdu 2846

    字典树的变形,常规字典树用来求前缀的,所以把每个单词拆成len个词建树,为了避免abab这样的查ab时会出现两次,每次加一个标记,如果该节点上次的建树的单词与本次相同就不更新,否则更新 #includ ...

  4. TCP/IP系列——长连接与短连接的区别

    1 什么是长连接和短连接       三次握手和四次挥手   TCP区别于UDP最重要的特点是TCP必须建立在可靠的连接之上,连接的建立和释放就是握手和挥手的过程. 三次握手为连接的建立过程,握手失败 ...

  5. 网络基础知识HTTP(1) --转载

    为什么要写网络? 作为网站开发人员,你所开发的软件产品最终是要在网络上运行的.这就像一个生产商,要生产供给东北地区的产品,而生产商对东北的天气.地理.人文毫无了解.生产商的产品肯定是不可用的,或者低端 ...

  6. Js Json 互转

    推荐: //js对象转换为 JSON 文本 var text = '[{"id":1,"name":"C","size" ...

  7. CSS---------------之文本类型

    通过font来改变文本,主要从以下几个方面 字体加粗,字体的风格:斜体和字体变形:小型大写字母 字体的大小 行高 字体 示例如下 p{font:italic 75%/125% "Comic ...

  8. Lambda表达式的面纱(一)

    在.NET3.0版本中微软推出了Lambda表达式.这使代码的表述可以更加优雅.但是对于新事物大多会本能的排斥,虽然3.0版本已经推出了好久了,但是我向周围的人了解了一下,用Lambda的人不是很多, ...

  9. myeclipse8.6 for spring环境配置

     

  10. hdu1597

    Problem Description 假设: S1 = 1 S2 = 12 S3 = 123 S4 = 1234 ......... S9 = 123456789 S10 = 1234567891 ...