Leetcode:Repeated DNA Sequences详细题解
题目
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
原题链接:https://oj.leetcode.com/problems/repeated-dna-sequences/
straight-forward method(TLE)
算法分析
直接字符串匹配;设计next数组,存字符串中每个字母在其中后续出现的位置;遍历时以next数组为起始。
简化考虑长度为4的字符串
case1:
src A C G T A C G T
next [4] [5] [6] [7] [-1] [-1] [-1] [-1]
那么匹配ACGT字符串的过程,匹配next[0]之后的3位字符即可
case2:
src A C G T A A C G T
next [4] [5] [6] [7] [5] [-1] [-1] [-1] [-1]
多个A字符后继,那么需要匹配所有后继,匹配next[0]不符合之后,还要匹配next[next[0]]
case3:
src A A A A A A
next [1] [2] [3] [4] [5] [-1]
重复的情况,在next[0]匹配成功时,可以把next[next[0]]置为-1,即以next[0]开始的长度为4的字符串已经成功匹配过了,无需再次匹配了;当然这么做只能减少重复的情况,并不能消除重复,因此仍需要使用一个set存储匹配成功的结果,方便去重
时间复杂度
构造next数组的复杂度O(n^2),遍历的复杂度O(n^2);总时间复杂度O(n^2)
代码实现
#include <string>
#include <vector>
#include <set> class Solution {
public:
std::vector<std::string> findRepeatedDnaSequences(std::string s); ~Solution(); private:
std::size_t* next;
}; std::vector<std::string> Solution::findRepeatedDnaSequences(std::string s) {
std::vector<std::string> rel; if (s.length() <= ) {
return rel;
} next = new std::size_t[s.length()]; // cal next array
for (int pos = ; pos < s.length(); ++pos) {
next[pos] = s.find_first_of(s[pos], pos + );
} std::set<std::string> tmpRel; for (int pos = ; pos < s.length(); ++pos) {
std::size_t nextPos = next[pos];
while (nextPos != std::string::npos) {
int ic = pos;
int in = nextPos;
int count = ;
while (in != s.length() && count < && s[++ic] == s[++in]) {
++count;
}
if (count == ) {
tmpRel.insert(s.substr(pos, ));
next[nextPos] = std::string::npos;
}
nextPos = next[nextPos];
}
} for (auto itr = tmpRel.begin(); itr != tmpRel.end(); ++itr) {
rel.push_back(*itr);
} return rel;
} Solution::~Solution() {
delete [] next;
}
hash table plus bit manipulation method
(view the Show Tags and Runtime 10ms !)
算法分析
首先考虑将ACGT进行二进制编码
A -> 00
C -> 01
G -> 10
T -> 11
在编码的情况下,每10位字符串的组合即为一个数字,且10位的字符串有20位;一般来说int有4个字节,32位,即可以用于对应一个10位的字符串。例如
ACGTACGTAC -> 00011011000110110001
AAAAAAAAAA -> 00000000000000000000
20位的二进制数,至多有2^20种组合,因此hash table的大小为2^20,即1024 * 1024,将hash table设计为bool hashTable[1024 * 1024];
遍历字符串的设计
每次向右移动1位字符,相当于字符串对应的int值左移2位,再将其最低2位置为新的字符的编码值,最后将高2位置0。例如
src CAAAAAAAAAC
subStr CAAAAAAAAA
int 0100000000
subStr AAAAAAAAAC
int 0000000001
时间复杂度
字符串遍历O(n),hash tableO(1);总时间复杂度O(n)
代码实现
#include <string>
#include <vector>
#include <unordered_set>
#include <cstring> bool hashMap[*]; class Solution {
public:
std::vector<std::string> findRepeatedDnaSequences(std::string s);
}; std::vector<std::string> Solution::findRepeatedDnaSequences(std::string s) {
std::vector<std::string> rel;
if (s.length() <= ) {
return rel;
} // map char to code
unsigned char convert[];
convert[] = ; // 'A' - 'A' 00
convert[] = ; // 'C' - 'A' 01
convert[] = ; // 'G' - 'A' 10
convert[] = ; // 'T' - 'A' 11 // initial process
// as ten length string
memset(hashMap, false, sizeof(hashMap)); int hashValue = ; for (int pos = ; pos < ; ++pos) {
hashValue <<= ;
hashValue |= convert[s[pos] - 'A'];
} hashMap[hashValue] = true; std::unordered_set<int> strHashValue; //
for (int pos = ; pos < s.length(); ++pos) {
hashValue <<= ;
hashValue |= convert[s[pos] - 'A'];
hashValue &= ~(0x300000); if (hashMap[hashValue]) {
if (strHashValue.find(hashValue) == strHashValue.end()) {
rel.push_back(s.substr(pos - , ));
strHashValue.insert(hashValue);
}
} else {
hashMap[hashValue] = true;
}
} return rel;
}
Leetcode:Repeated DNA Sequences详细题解的更多相关文章
- [LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [Leetcode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- LeetCode() Repeated DNA Sequences 看的非常的过瘾!
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- [LeetCode] Repeated DNA Sequences hash map
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- lc面试准备:Repeated DNA Sequences
1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...
- LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)
187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...
- 【LeetCode】Repeated DNA Sequences 解题报告
[题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...
- [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
- 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...
随机推荐
- systrace跟踪 Android性能优化
http://blog.csdn.net/oujunli/article/details/8138172 http://blog.csdn.net/oujunli/article/details/50 ...
- Java NIO类库Selector机制解析--转
一. 前言 自从J2SE 1.4版本以来,JDK发布了全新的I/O类库,简称NIO,其不但引入了全新的高效的I/O机制,同时,也引入了多路复用的异步模式.NIO的包中主要包含了这样几种抽象数据类型: ...
- [转] What is the point of redux when using react?
As I am sure you have heard a bunch of times, by now, React is the V in MVC. I think you can think o ...
- titlebar和actionbar上的按钮设置
---恢复内容开始--- Actionbar加按钮: 在res文件夹下新建menu文件夹(如果你没有),然后添加一个XML文件 <?xml version="1.0" enc ...
- json 序列化的两种方式
JavaScriptSerializer Serializer = new JavaScriptSerializer(); ResultData<EUserData> resultMode ...
- Oracle 卸载 不干净
关闭oracle相关的服务 注册表删除(可能因为oracle及windows的版本不同注册表信息也有些差异): 开始è输入regedit 打开注册表编辑器删除下面的目录 HKEY_LOCAL_MACH ...
- 如何消除inline-block产生的元素间空隙
前端初学者可能都会碰到这个问题:有时候排版需要,会把一些块状元素的display属性设置为inline-block,如 <!-- HTML代码 --> <div class=&quo ...
- WisDom.Net 框架设计(一) 总体框架
WisDom.Net总体框架 1.目标 WisDom.Net 做为以后快速开发相关的软件的基础框架,实现用户,权限,角色,菜单,和工作流的管理功能.相关功能可以独立使用,快速用于其他程序的开发.预计 ...
- CentOS安装memcached及配置php的memcache扩展
遇到的问题: 这个问题主要是linux服务器安装memcached服务后,phpinfo信息没有memcache扩展,所以主要是给php安装memcache扩展,教程中是安装memcache扩展,我认 ...
- Android ListView 嵌套 ImageView,如何响应ImageView的点击和长按事件
http://www.tuicool.com/articles/EZv2Uv 1.先说下嵌套在ListView中的ImageView如何响应点击事件 方法:在imageView中设置onClick属性 ...