lc面试准备:Repeated DNA Sequences
1 题目
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return:
["AAAAACCCCC", "CCCCCAAAAA"].
接口
public List<String> findRepeatedDnaSequences(String s);
2 思路
寻找重复出现1次以上的10个字符串。
思路1:TLE。从第一个子字符串开始遍历,并存储在List中,如果某个子字符串出现两次,就将其添加到结果List中。
思路2:映射4个字符'A''C''G''T'为整数,对整数进行移位以及位与操作,以获取相应的子字符串。
①将ACGT进行二进制编码
A -> 00
C -> 01
G -> 10
T -> 11
②在编码的情况下,每10位字符的组合是一个数字value,且10位的字符串有20位;int是32位,可以储存。例如
ACGTACGTAC -> 00011011000110110001
AAAAAAAAAA -> 00000000000000000000
采用HashSet来存储这些value。20位的二进制数,至多有2^20种组合,因此hash set的大小最大为2^20,即1024 * 1024。
每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0。
value = (value << 2) + 字符的编码值int;
value &= (1 << 20) - 1;//整数占32个位,获取其低20位(题中要求是长度为10的子字符串,映射为整数后,子字符串总共占用20位)
例如
src CAAAAAAAAAC
subStr CAAAAAAAAA
int 0100000000
subStr AAAAAAAAAC
int 0000000001
复杂度
3 代码
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set; public class Solution {
// Time:O(n) Space:O(n)
private final static Map<Character, Integer> chsMap = new HashMap<Character, Integer>(
4);
static {
chsMap.put('A', 0);
chsMap.put('C', 1);
chsMap.put('G', 2);
chsMap.put('T', 3);
} public List<String> findRepeatedDnaSequences(String s) {
Set<String> res = new HashSet<String>();
final int length = s.length();
if (length <= 10)
return new LinkedList<String>(res);
Set<Integer> intSet = new HashSet<Integer>();
int value = 0;
for (int i = 0; i < 9; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
}
for (int i = 9; i < length; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
value &= (1 << 20) - 1;
if (intSet.contains(value)) {
res.add(s.substring(i - 9, i + 1));
}
intSet.add(value);
}
return new LinkedList<String>(res);
}
}
4 扩展
- 当该字符串中没有出现长度为10的子字符串出现两次以上时,返回结果为[],而不是null。
- 若输入'AAAAAAAAAAA', 输出['AAAAAAAAAA']
5 参考
lc面试准备:Repeated DNA Sequences的更多相关文章
- LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)
187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...
- 187. Repeated DNA Sequences重复的DNA子串序列
[抄题]: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &qu ...
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [Leetcode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- 【leetcode】Repeated DNA Sequences(middle)★
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- LeetCode() Repeated DNA Sequences 看的非常的过瘾!
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- Java for LeetCode 187 Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
随机推荐
- windows向ubuntu过渡之常用软件安装
好久没有写博客了,介于最近上操作系统实验课,好多同学装上了ubuntu,网上的教程比较杂乱,下面我就总结分享一些安装完ubuntu要安装的常用软件,会持续更新... 1.搜狗拼音安装 (1)在安装输入 ...
- CentOS7安装Python3.5
2. 安装Python的依赖包 yum -y groupinstall "Development tools" yum -y install openssl-devel sqlit ...
- mRemote配置
配置完mRemote后 备份C:\Users\Administrator\AppData\Local\Felix_Deimel\mRemote\confCons.xml文件 覆盖到其他电脑可以直接使用
- C#中的static静态变量的用法
静态全局变量 定义:在全局变量前,加上关键字 static 该变量就被定义成为了一个静态全局变量. 特点: A.该变量在全局数据区分配内存. B.初始化:如果不显式初始化,那么将被隐式初始化为0. 静 ...
- The hacker's sanbox游戏
第一关:使用/usr/hashcat程序,对passwd中root的密码进行解密,得到gravity98 执行su,输入密码gravity98. 第二关:获取提供的工具,wget http://are ...
- ASP.NET 相关小知识
后台修改前台html控件属性 添加 runat=server ,后台获取// 客户端隐藏 a.Attributes[ "style "] = "display:none ...
- iOS 十六进制的相加取反
ios中将NSstring字符串转换成char类型 NSString *string = [NSString stringWithFormat:@"5D"]; const char ...
- LINQ 101——分组、Set、转换、Element
一.Grouping(分组) 例1:对于0-9数按被3整除的结果分组 代码: static void Linq1() { , , , , , , , , , }; var numModBy3 = fr ...
- 我的第一篇博客:requestAnimationFrame学习笔记
通常,我们在浏览器中写动画会用到哪些技术呢? flash 可以实现一些非常复杂的动画,但随着HTML5的成熟,个人感觉flash终究会成为明日黄花. css3 当前大部分现代浏览器已经对css3支持的 ...
- 浅析tornado web框架
tornado简介 1.tornado概述 Tornado就是我们在 FriendFeed 的 Web 服务器及其常用工具的开源版本.Tornado 和现在的主流 Web 服务器框架(包括大多数 Py ...