Repeated DNA Sequences 解答
Question
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].
Solution -- Bit Manipulation
Original idea is to use a set to store each substring. Time complexity is O(n) and space cost is O(n). But for details of space cost, a char is 2 bytes, so we need 20 bytes to store a substring and therefore (20n) space.
If we represent DNA substring by integer, the space is cut down to (4n).
public List<String> findRepeatedDnaSequences(String s) {
List<String> result = new ArrayList<String>(); int len = s.length();
if (len < 10) {
return result;
} Map<Character, Integer> map = new HashMap<Character, Integer>();
map.put('A', 0);
map.put('C', 1);
map.put('G', 2);
map.put('T', 3); Set<Integer> temp = new HashSet<Integer>();
Set<Integer> added = new HashSet<Integer>(); int hash = 0;
for (int i = 0; i < len; i++) {
if (i < 9) {
//each ACGT fit 2 bits, so left shift 2
hash = (hash << 2) + map.get(s.charAt(i));
} else {
hash = (hash << 2) + map.get(s.charAt(i));
//make length of hash to be 20
hash = hash & (1 << 20) - 1; if (temp.contains(hash) && !added.contains(hash)) {
result.add(s.substring(i - 9, i + 1));
added.add(hash); //track added
} else {
temp.add(hash);
}
} } return result;
}
Repeated DNA Sequences 解答的更多相关文章
- lc面试准备:Repeated DNA Sequences
1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...
- LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)
187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [Leetcode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- 【leetcode】Repeated DNA Sequences(middle)★
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- LeetCode() Repeated DNA Sequences 看的非常的过瘾!
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
随机推荐
- MySQL的备份和还原
MySQL的备份和还原 备份:副本 RAID1,RAID10:保证硬件损坏而不会业务中止: DROP TABLE mydb.tb1; 备份类型: 热备份.温备份和冷备 ...
- dig命令(转载)
dig命令使用大全(linux上域名查询) 可以这样说,翻译本篇文档的过程就是我重新学习DNS的过程,dig命令可以帮助我们学习DNS的原理,配置,以及其查询过程.以前使用dig仅仅是查询一下A记录或 ...
- 数据库中的索引Index
索引就像一本书的目录,而书中的索引是对一个词语的列表,其中注明了包含各个词的页码.数据库中的索引 是某一个表中一列或者若干列值的集合和相应的只想表中物理标识这些值的数据页的逻辑指针清单. 索引的作用: ...
- ssh登录失败处理步骤
如果登录失败而又找不到显示的原因,优先使用ssh -vT name@ip -p port 进行调试,查看所使用的key文件.ip.端口是否正确.然后再检查下面步骤:1.检查在对应用户名下是否有iden ...
- linux指定动态运行库的位置
动态运行库在windows.linux下均广泛使用.windows下通常为dll文件,linux下为so文件.不过,对于部署程序,这两个系统查找依赖的运行库文件时却不一样.对于windows而言,优先 ...
- Linux查看系统信息
系统 # uname -a # 查看内核/操作系统/CPU信息 # head -n 1 /etc/issue # 查看操作系统版本 # cat /proc/cpuinfo # 查看CPU信息 # ho ...
- mycat实例(1)
2016二月 22 置原 MyCat - 使用篇(1) 分类:数据库分库分表(Mycat等) (1126) (1) 数据库路由中间件MyCat - 使用篇(1) 基本概念 直接介绍概念太枯燥了,还是拿 ...
- Android——编译odex保护
编译过android源代码的可能试验过改动编译类型.android的初始化编译配置可參考Android--编译系统初始化设置 一.TARGET_BUILD_VARIANT=user 当选择的编译类型为 ...
- CSS 相关知识总结
1 什么是CSS? CSS全称(Cascading Style Sheets)是一门指定文档该如何呈现给用户的语言. 2 为何使用CSS? CSS 文档信息的内容和如何展现它的细节想分离,文档细节即为 ...
- aspx调用webmethod
[WebMethod] public static string CheckLogin(string user, string pwd) { pwd = FormsAuthentication.Has ...