lc面试准备:Repeated DNA Sequences
1 题目
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return:
["AAAAACCCCC", "CCCCCAAAAA"].
接口
public List<String> findRepeatedDnaSequences(String s);
2 思路
寻找重复出现1次以上的10个字符串。
思路1:TLE。从第一个子字符串开始遍历,并存储在List中,如果某个子字符串出现两次,就将其添加到结果List中。
思路2:映射4个字符'A''C''G''T'为整数,对整数进行移位以及位与操作,以获取相应的子字符串。
①将ACGT进行二进制编码
A -> 00
C -> 01
G -> 10
T -> 11
②在编码的情况下,每10位字符的组合是一个数字value,且10位的字符串有20位;int是32位,可以储存。例如
ACGTACGTAC -> 00011011000110110001
AAAAAAAAAA -> 00000000000000000000
采用HashSet来存储这些value。20位的二进制数,至多有2^20种组合,因此hash set的大小最大为2^20,即1024 * 1024。
每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0。
value = (value << 2) + 字符的编码值int;
value &= (1 << 20) - 1;//整数占32个位,获取其低20位(题中要求是长度为10的子字符串,映射为整数后,子字符串总共占用20位)
例如
src CAAAAAAAAAC
subStr CAAAAAAAAA
int 0100000000
subStr AAAAAAAAAC
int 0000000001
复杂度
3 代码
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set; public class Solution {
// Time:O(n) Space:O(n)
private final static Map<Character, Integer> chsMap = new HashMap<Character, Integer>(
4);
static {
chsMap.put('A', 0);
chsMap.put('C', 1);
chsMap.put('G', 2);
chsMap.put('T', 3);
} public List<String> findRepeatedDnaSequences(String s) {
Set<String> res = new HashSet<String>();
final int length = s.length();
if (length <= 10)
return new LinkedList<String>(res);
Set<Integer> intSet = new HashSet<Integer>();
int value = 0;
for (int i = 0; i < 9; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
}
for (int i = 9; i < length; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
value &= (1 << 20) - 1;
if (intSet.contains(value)) {
res.add(s.substring(i - 9, i + 1));
}
intSet.add(value);
}
return new LinkedList<String>(res);
}
}
4 扩展
- 当该字符串中没有出现长度为10的子字符串出现两次以上时,返回结果为[],而不是null。
- 若输入'AAAAAAAAAAA', 输出['AAAAAAAAAA']
5 参考
lc面试准备:Repeated DNA Sequences的更多相关文章
- LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)
187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...
- 187. Repeated DNA Sequences重复的DNA子串序列
[抄题]: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &qu ...
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [Leetcode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- 【leetcode】Repeated DNA Sequences(middle)★
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- LeetCode() Repeated DNA Sequences 看的非常的过瘾!
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
随机推荐
- eclipse快速查找一个变量、方法或者类被引用的地方
最近不停debug,拿到一个变量之后总是要先概览一下才好下手,之前一直用Ctrl+F来做,太麻烦.今天查了下eclipse使用,发现有快捷键,使用方法: 先双击要查看的变量.方法或者类,使之被选中,然 ...
- cocos2dx 文件处理
问题1:fopen 在vs下使用fopen进行文件处理,跑通了,但是移植到android源码下时就出现了一大推问题,首先需要理解的是在vs下开发资源是存放在执行文件的相同目录下的,而移植到androi ...
- C# 引用SHDocVw 实现模拟网页操作
因为最近项目需要,所以接触到了网页爬取. 1. HttpWebRequest 初期接触的都是一些比较简单的网页,通过Fiddler抓包分析后,就能模拟进行http请求,进行想要的操作. 2. WebB ...
- 跟着老男孩一步步学习Shell高级编程实战
原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 .作者信息和本声明.否则将追究法律责任.http://oldboy.blog.51cto.com/2561410/1264627 本sh ...
- Angularjs总结(五)指令运用及常用控件的赋值操作
1.常用指令 <div ng-controller="jsyd-controller"> <div style="float:left;width:10 ...
- jQuery 选择器【1】
jQuery 选择器 请使用我们的 jQuery 选择器检测器 来演示不同的选择器. 选择器 实例 选取 * $("*") 所有元素 #id $("#lastname&q ...
- BFC探秘
今天面试被问到了BFC,听到这个缩略词我是懵比的,啥东西?还是太年轻太简单啊.于是面试结束之后搜了几篇博客看了下,看完有一种豁然开朗的感觉,一些之前未能理解的CSS元素行为也知其所以然了.顺便说一下, ...
- js作用域链
js作用域链 <script> var up = 555; function display(){ var innerVar = 2; function inner(){ var inne ...
- Lua与C/C++交互问题
初学lua,遇到注册C/C++交互函数问题 在lua与C/C++交互时,C/C++的注册Lua函数若是一个有返回类型(压栈)而不是获取类型的时候应该返回1而不是返回0,否则会出现在Lua中值为nil( ...
- 【elasticsearch】(4)centos7 超简单安装elasticsearch 的 jdbc
前言 elasticsearch(下面简称ES)使用jdbc连接mysql比go-mysql-elasticsearch的elasticsearch-river-jdbc能够很好的支持增量数据更新的问 ...