Repeated DNA Sequences 解答
Question
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Solution -- Bit Manipulation
Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.
If we represent DNA substring by integer, the space is cut down to (4n).
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<String>(); int len = s.length();
if (len < 10) {
return result;
} Map<Character, Integer> map = new HashMap<Character, Integer>();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); Set<Integer> temp = new HashSet<Integer>();
Set<Integer> added = new HashSet<Integer>(); int hash = 0;
for (int i = 0; i < len; i++) {
if (i < 9) {
//each ACGT fit 2 bits, so left shift 2
hash = (hash << 2) + map.get(s.charAt(i));
} else {
hash = (hash << 2) + map.get(s.charAt(i));
//make length of hash to be 20
hash = hash & (1 << 20) - 1; if (temp.contains(hash) && !added.contains(hash)) {
result.add(s.substring(i - 9, i + 1));
added.add(hash); //track added
} else {
temp.add(hash);
}
} } return result;
}
Repeated DNA Sequences 解答的更多相关文章
- lc面试准备:Repeated DNA Sequences
1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...
- LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)
187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [Leetcode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- 【leetcode】Repeated DNA Sequences(middle)★
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- LeetCode() Repeated DNA Sequences 看的非常的过瘾!
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
随机推荐
- 【转】20个Java 代码生成器
From: http://www.cnblogs.com/skyme/archive/2011/12/22/2297592.html 1.1 CodeSmith 一款人气很旺国外的基于模板的dotne ...
- html天气预报小插件
<head></head> <body> <iframe width="225" scrolling="no" hei ...
- windows10 离线包安装net3.5
找到离线镜像: 管理员命令行运行:dism.exe /online /enable-feature /featurename:netfx3 /Source:E:\sources\sxs 路径根据实际情 ...
- Linux各种包安装及命令
1.Locate yum -y install mlocate 若出现问题: locate: can not stat () `/var/lib/mlocate/mlocate.db': 没有那个文件 ...
- python学习之路-12
线程池 上下文管理 线程池中关于上下文管理的相关代码 点我查看更详细的上下文管理介绍 import contextlib @contextlib.contextmanager def worker_s ...
- ubuntu 配置jdk
shawn@e014-anle-lnx:~$ sudo su # chmod 777 jdk-6u27-linux-i586.bin # ./jdk-6u27-linux-i586.bin # mv ...
- POJ训练计划2777_Count Color(线段树/成段更新/区间染色)
解题报告 题意: 对线段染色.询问线段区间的颜色种数. 思路: 本来直接在线段树上染色,lz标记颜色.每次查询的话訪问线段树,求出颜色种数.结果超时了,最坏的情况下,染色能够染到叶子节点. 换成存下区 ...
- iOS8 Core Image In Swift:人脸检测以及马赛克
iOS8 Core Image In Swift:自动改善图像以及内置滤镜的使用 iOS8 Core Image In Swift:更复杂的滤镜 iOS8 Core Image In Swift:人脸 ...
- mybatis参数查询
单个参数查询 在mapper.xml配置文件中配置 <select id= "selectByNu" paramet ...
- android——屏幕适配大全(转载)
http://my.oschina.net/u/2008084/blog/496161 一.适配可行性 早在Android设计之初就考虑到了这一点,为了让app适应标准or山寨屏幕,google已经有 ...