1 题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

接口

public List<String> findRepeatedDnaSequences(String s);

2 思路

寻找重复出现1次以上的10个字符串。

思路1:TLE。从第一个子字符串开始遍历,并存储在List中,如果某个子字符串出现两次,就将其添加到结果List中。

思路2:映射4个字符'A''C''G''T'为整数,对整数进行移位以及位与操作,以获取相应的子字符串。

①将ACGT进行二进制编码

A -> 00

C -> 01

G -> 10

T -> 11

②在编码的情况下,每10位字符的组合是一个数字value,且10位的字符串有20位;int是32位,可以储存。例如

ACGTACGTAC -> 00011011000110110001

AAAAAAAAAA -> 00000000000000000000

采用HashSet来存储这些value。20位的二进制数,至多有2^20种组合,因此hash set的大小最大为2^20,即1024 * 1024。

③遍历字符串

每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0。

 value = (value << 2) + 字符的编码值int;
value &= (1 << 20) - 1;//整数占32个位,获取其低20位(题中要求是长度为10的子字符串,映射为整数后,子字符串总共占用20位)

例如

src CAAAAAAAAAC

subStr CAAAAAAAAA

int 0100000000

subStr AAAAAAAAAC

int 0000000001

复杂度

Time:O(n)  Space:O(n) 

3 代码

 import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set; public class Solution {
// Time:O(n) Space:O(n)
private final static Map<Character, Integer> chsMap = new HashMap<Character, Integer>(
4);
static {
chsMap.put('A', 0);
chsMap.put('C', 1);
chsMap.put('G', 2);
chsMap.put('T', 3);
} public List<String> findRepeatedDnaSequences(String s) {
Set<String> res = new HashSet<String>();
final int length = s.length();
if (length <= 10)
return new LinkedList<String>(res);
Set<Integer> intSet = new HashSet<Integer>();
int value = 0;
for (int i = 0; i < 9; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
}
for (int i = 9; i < length; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
value &= (1 << 20) - 1;
if (intSet.contains(value)) {
res.add(s.substring(i - 9, i + 1));
}
intSet.add(value);
}
return new LinkedList<String>(res);
}
}

4  扩展

  • 当该字符串中没有出现长度为10的子字符串出现两次以上时,返回结果为[],而不是null。
  • 若输入'AAAAAAAAAAA', 输出['AAAAAAAAAA']

5 参考

  1. leetcode
  2. Leetcode:Repeated DNA Sequences详细题解

lc面试准备:Repeated DNA Sequences的更多相关文章

  1. LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)

    187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...

  2. 187. Repeated DNA Sequences重复的DNA子串序列

    [抄题]: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &qu ...

  3. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. [Leetcode] Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  6. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  7. LeetCode() Repeated DNA Sequences 看的非常的过瘾!

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  9. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. SQL语句中"where 1=1"和"where 1=0"的作用

    where 1=1; 这个条件始终为True,在不定数量查询条件情况下,1=1可以很方便的规范语句. 一.不用where 1=1 在多条件查询中的困扰 举个例子,如果您做查询页面,并且,可查询的选项有 ...

  2. iOS开发中.pch 文件的使用及其相关工程设置

    .pch文件 也是一个头文件,pch头文件的内容能被项目中的其他所有源文件共享和访问.是一个预编译文件. 首先说一下pch的作用: 1.存放一些全局的宏(整个项目中都用得上的宏) 2.用来包含一些全部 ...

  3. app抓包

    http://www.360doc.com/content/14/1126/11/9200790_428168701.shtml 记得下载证书  不然有些网站是抓不到的

  4. svn出错问题(用户名密码有修改以及资源url改变时)

    用eclipse 同步SVN服务器宛然无法访问了: org.tigris.subversion.javahl.ClientException: RA layer request failed svn: ...

  5. everything is nothing

    可以选ios,可以选择android ,可以选择javaee,因为不想让自己这段时间的努力没有一个完美的结局.最起码真的能做点东西出来,所以6.10--8.10,两个月把javaee实训的该准备.可以 ...

  6. WisDom.Net 框架设计(五) 权限设计

    WisDom.Net --权限设计 1.需求分析     基本在所有的管理系统中都离不开权限管理.可以这么说,权限管理是管理系统的核心所在. 权限管理说白一些就是每个人能够做什么,不能够做什么.可以说 ...

  7. Maven3(笔记二)

    笔记本二   在Eclipse 中使用Maven 第一节:m2eclipse 插件安装 打开Eclipse,点击菜单Help - > Install New Software 点击Add 按钮N ...

  8. [Twisted] transport

    transport代表网络上两个节点的连接.它描述了连接的具体细节,如TCP还是UDP. transports实现了ITransport接口,包含以下方法 write:以非阻塞的方式向连接写数据. w ...

  9. iOS中打印系统详细日志

    Q:如何打印当前的函数和行号? A:我们可以在打印时使用一些预编译宏作为打印参数,来打印当前的函数和行号.如: 1 NSLog(@"%s:%d obj=%@", __func__, ...

  10. xmlns:tools="http://schemas.android.com/tools"以及tools:context=".ConfActivity"是什么意思

    xmlns:tools="http://schemas.android.com/tools"这个是xml的命名空间,有了他,你就可以alt+/作为提示,提示你输入什么,不该输入什么 ...