1 题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

接口

public List<String> findRepeatedDnaSequences(String s);

2 思路

寻找重复出现1次以上的10个字符串。

思路1:TLE。从第一个子字符串开始遍历,并存储在List中,如果某个子字符串出现两次,就将其添加到结果List中。

思路2:映射4个字符'A''C''G''T'为整数,对整数进行移位以及位与操作,以获取相应的子字符串。

①将ACGT进行二进制编码

A -> 00

C -> 01

G -> 10

T -> 11

②在编码的情况下,每10位字符的组合是一个数字value,且10位的字符串有20位;int是32位,可以储存。例如

ACGTACGTAC -> 00011011000110110001

AAAAAAAAAA -> 00000000000000000000

采用HashSet来存储这些value。20位的二进制数,至多有2^20种组合,因此hash set的大小最大为2^20,即1024 * 1024。

③遍历字符串

每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0。

 value = (value << 2) + 字符的编码值int;
value &= (1 << 20) - 1;//整数占32个位,获取其低20位(题中要求是长度为10的子字符串,映射为整数后,子字符串总共占用20位)

例如

src CAAAAAAAAAC

subStr CAAAAAAAAA

int 0100000000

subStr AAAAAAAAAC

int 0000000001

复杂度

Time:O(n)  Space:O(n) 

3 代码

 import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set; public class Solution {
// Time:O(n) Space:O(n)
private final static Map<Character, Integer> chsMap = new HashMap<Character, Integer>(
4);
static {
chsMap.put('A', 0);
chsMap.put('C', 1);
chsMap.put('G', 2);
chsMap.put('T', 3);
} public List<String> findRepeatedDnaSequences(String s) {
Set<String> res = new HashSet<String>();
final int length = s.length();
if (length <= 10)
return new LinkedList<String>(res);
Set<Integer> intSet = new HashSet<Integer>();
int value = 0;
for (int i = 0; i < 9; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
}
for (int i = 9; i < length; i++) {
value = (value << 2) + chsMap.get(s.charAt(i));
value &= (1 << 20) - 1;
if (intSet.contains(value)) {
res.add(s.substring(i - 9, i + 1));
}
intSet.add(value);
}
return new LinkedList<String>(res);
}
}

4  扩展

  • 当该字符串中没有出现长度为10的子字符串出现两次以上时,返回结果为[],而不是null。
  • 若输入'AAAAAAAAAAA', 输出['AAAAAAAAAA']

5 参考

  1. leetcode
  2. Leetcode:Repeated DNA Sequences详细题解

lc面试准备:Repeated DNA Sequences的更多相关文章

  1. LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)

    187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...

  2. 187. Repeated DNA Sequences重复的DNA子串序列

    [抄题]: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &qu ...

  3. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. [Leetcode] Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  5. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  6. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  7. LeetCode() Repeated DNA Sequences 看的非常的过瘾!

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  9. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. eclipse快速查找一个变量、方法或者类被引用的地方

    最近不停debug,拿到一个变量之后总是要先概览一下才好下手,之前一直用Ctrl+F来做,太麻烦.今天查了下eclipse使用,发现有快捷键,使用方法: 先双击要查看的变量.方法或者类,使之被选中,然 ...

  2. cocos2dx 文件处理

    问题1:fopen 在vs下使用fopen进行文件处理,跑通了,但是移植到android源码下时就出现了一大推问题,首先需要理解的是在vs下开发资源是存放在执行文件的相同目录下的,而移植到androi ...

  3. C# 引用SHDocVw 实现模拟网页操作

    因为最近项目需要,所以接触到了网页爬取. 1. HttpWebRequest 初期接触的都是一些比较简单的网页,通过Fiddler抓包分析后,就能模拟进行http请求,进行想要的操作. 2. WebB ...

  4. 跟着老男孩一步步学习Shell高级编程实战

    原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 .作者信息和本声明.否则将追究法律责任.http://oldboy.blog.51cto.com/2561410/1264627 本sh ...

  5. Angularjs总结(五)指令运用及常用控件的赋值操作

    1.常用指令 <div ng-controller="jsyd-controller"> <div style="float:left;width:10 ...

  6. jQuery 选择器【1】

    jQuery 选择器 请使用我们的 jQuery 选择器检测器 来演示不同的选择器. 选择器 实例 选取 * $("*") 所有元素 #id $("#lastname&q ...

  7. BFC探秘

    今天面试被问到了BFC,听到这个缩略词我是懵比的,啥东西?还是太年轻太简单啊.于是面试结束之后搜了几篇博客看了下,看完有一种豁然开朗的感觉,一些之前未能理解的CSS元素行为也知其所以然了.顺便说一下, ...

  8. js作用域链

    js作用域链 <script> var up = 555; function display(){ var innerVar = 2; function inner(){ var inne ...

  9. Lua与C/C++交互问题

    初学lua,遇到注册C/C++交互函数问题 在lua与C/C++交互时,C/C++的注册Lua函数若是一个有返回类型(压栈)而不是获取类型的时候应该返回1而不是返回0,否则会出现在Lua中值为nil( ...

  10. 【elasticsearch】(4)centos7 超简单安装elasticsearch 的 jdbc

    前言 elasticsearch(下面简称ES)使用jdbc连接mysql比go-mysql-elasticsearch的elasticsearch-river-jdbc能够很好的支持增量数据更新的问 ...