题目

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

原题链接：https://oj.leetcode.com/problems/repeated-dna-sequences/

straight-forward method（TLE）

算法分析

直接字符串匹配；设计next数组，存字符串中每个字母在其中后续出现的位置；遍历时以next数组为起始。

简化考虑长度为4的字符串

case1:

src A C G T A C G T

next [4] [5] [6] [7] [-1] [-1] [-1] [-1]

那么匹配ACGT字符串的过程，匹配next[0]之后的3位字符即可

case2：

src A C G T A A C G T

next [4] [5] [6] [7] [5] [-1] [-1] [-1] [-1]

多个A字符后继，那么需要匹配所有后继，匹配next[0]不符合之后，还要匹配next[next[0]]

case3：

src A A A A A A

next [1] [2] [3] [4] [5] [-1]

重复的情况，在next[0]匹配成功时，可以把next[next[0]]置为-1，即以next[0]开始的长度为4的字符串已经成功匹配过了，无需再次匹配了；当然这么做只能减少重复的情况，并不能消除重复，因此仍需要使用一个set存储匹配成功的结果，方便去重

时间复杂度

构造next数组的复杂度O(n^2)，遍历的复杂度O(n^2)；总时间复杂度O(n^2)

代码实现

 #include <string>

 #include <vector>

 #include <set>

 class Solution {

 public:

     std::vector<std::string> findRepeatedDnaSequences(std::string s);

     ~Solution();

 private:

     std::size_t* next;

 };

 std::vector<std::string> Solution::findRepeatedDnaSequences(std::string s) {

     std::vector<std::string> rel;

     if (s.length() <= ) {

         return rel;

     }

     next = new std::size_t[s.length()];

     // cal next array

     for (int pos = ; pos < s.length(); ++pos) {

         next[pos] = s.find_first_of(s[pos], pos + );

     }

     std::set<std::string> tmpRel;

     for (int pos = ; pos < s.length(); ++pos) {

         std::size_t nextPos = next[pos];

         while (nextPos != std::string::npos) {

             int ic = pos;

             int in = nextPos;

             int count = ;

             while (in != s.length() && count <  && s[++ic] == s[++in]) {

                 ++count;

             }

             if (count == ) {

                 tmpRel.insert(s.substr(pos, ));

                 next[nextPos] = std::string::npos;

             }

             nextPos = next[nextPos];

         }

     }

     for (auto itr = tmpRel.begin(); itr != tmpRel.end(); ++itr) {

         rel.push_back(*itr);

     }

     return rel;

 }

 Solution::~Solution() {

     delete [] next;

 }

hash table plus bit manipulation method

（view the Show Tags and Runtime 10ms !）

算法分析

首先考虑将ACGT进行二进制编码

A -> 00

C -> 01

G -> 10

T -> 11

在编码的情况下，每10位字符串的组合即为一个数字，且10位的字符串有20位；一般来说int有4个字节，32位，即可以用于对应一个10位的字符串。例如

ACGTACGTAC -> 00011011000110110001

AAAAAAAAAA -> 00000000000000000000

20位的二进制数，至多有2^20种组合，因此hash table的大小为2^20，即1024 * 1024，将hash table设计为bool hashTable[1024 * 1024];

遍历字符串的设计

每次向右移动1位字符，相当于字符串对应的int值左移2位，再将其最低2位置为新的字符的编码值，最后将高2位置0。例如

src CAAAAAAAAAC

subStr CAAAAAAAAA

int 0100000000

subStr AAAAAAAAAC

int 0000000001

时间复杂度

字符串遍历O(n)，hash tableO(1)；总时间复杂度O(n)

代码实现

 #include <string>

 #include <vector>

 #include <unordered_set>

 #include <cstring>

 bool hashMap[*];

 class Solution {

 public:

     std::vector<std::string> findRepeatedDnaSequences(std::string s);

 };

 std::vector<std::string> Solution::findRepeatedDnaSequences(std::string s) {

     std::vector<std::string> rel;

     if (s.length() <= ) {

         return rel;

     }

     // map char to code

     unsigned char convert[];

     convert[] = ; // 'A' - 'A'  00

     convert[] = ; // 'C' - 'A'  01

     convert[] = ; // 'G' - 'A'  10

     convert[] = ; // 'T' - 'A' 11

     // initial process

     // as ten length string

     memset(hashMap, false, sizeof(hashMap));

     int hashValue = ;

     for (int pos = ; pos < ; ++pos) {

         hashValue <<= ;

         hashValue |= convert[s[pos] - 'A'];

     }

     hashMap[hashValue] = true;

     std::unordered_set<int> strHashValue;

     //

     for (int pos = ; pos < s.length(); ++pos) {

         hashValue <<= ;

         hashValue |= convert[s[pos] - 'A'];

         hashValue &= ~(0x300000);

         if (hashMap[hashValue]) {

             if (strHashValue.find(hashValue) == strHashValue.end()) {

                 rel.push_back(s.substr(pos - , ));

                 strHashValue.insert(hashValue);

             }

         } else {

             hashMap[hashValue] = true;

         }

     }

     return rel;

 }

Leetcode：Repeated DNA Sequences详细题解的更多相关文章

[LeetCode] Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
[Leetcode] Repeated DNA Sequences
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
LeetCode() Repeated DNA Sequences 看的非常的过瘾！
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
[LeetCode] Repeated DNA Sequences hash map
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
lc面试准备:Repeated DNA Sequences
1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...
LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)
187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...
【LeetCode】Repeated DNA Sequences 解题报告
[题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...
[LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...
【LeetCode】187. Repeated DNA Sequences 解题报告（Python）
作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...

随机推荐

《Java并发编程实战》第六章任务运行读书笔记
一. 在线程中运行任务无限制创建线程的不足 .线程生命周期的开销很高 .资源消耗 .稳定性二.Executor框架 Executor基于生产者-消费者模式.提交任务的操作相当于生产者.运行任务的线 ...
MVCC
http://blog.chinaunix.net/xmlrpc.php?id=3886838&r=blog/article&uid=26664667
TCP/IP(84) 详解
http://blog.csdn.net/zhangskd/article/category/873810
exit与return区别
1. exit用于结束正在运行的整个程序,它将参数返回给OS,把控制权交给操作系统:而return 是退出当前函数,返回函数值,把控制权交给调用函数.2. exit是系统调用级别,它表示一个进程的结束 ...
SQLite的 SQL语法总结
SQLite库可以解析大部分标准SQL语言.但它也省去了一些特性并且加入了一些自己的新特性.这篇文档就是试图描述那些SQLite支持/不支持的SQL语法的.查看关键字列表. 如下语法表格中,纯文本用蓝 ...
jad批量反编译class和jadeclipse集成到eclipse的设置方法
安装jad配置 1.从http://varaneckas.com/jad/下载windows版本的jad.exe 2.安装完毕后配置jad的系统环境变量批量解压jar和class文件 1.使用7zi ...
HTML select 操作
今天遇到一个问题,就是想设置select的默认选择项.但是试了很多方法都不行: <fieldset data-role="contractstatus"> <la ...
9.30 noip模拟试题
时限均为1s,内存 256MB 1.某种密码(password.*) 关于某种密码有如下描述:某种密码的原文A是由N个数字组成,而密文B是一个长度为N的01数串,原文和密文的关联在于一个钥匙码KEY. ...
HTML5小游戏源码收藏
html5魅族创意的贪食蛇游戏源码下载 html5网页版打砖块小游戏源码下载 html5 3D立体魔方小游戏源码下载 html5网页版飞机躲避游戏源码下载 html5三国人物连连看游戏源码下载 js ...
Unity3D 相机跟随主角移动
这里给主相机绑定一个脚本. 脚本写为: using UnityEngine; using System.Collections; public class camerafollow : MonoBeh ...

Leetcode：Repeated DNA Sequences详细题解

题目

straight-forward method（TLE）

算法分析

case1:

case2：

case3：

时间复杂度

代码实现

hash table plus bit manipulation method

（view the Show Tags and Runtime 10ms !）

算法分析

遍历字符串的设计

时间复杂度

代码实现

Leetcode：Repeated DNA Sequences详细题解的更多相关文章

随机推荐

热门专题