All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

非常好的思路: 转换成位操作。

算法分析

首先考虑将ACGT进行二进制编码

A -> 00

C -> 01

G -> 10

T -> 11

在编码的情况下,每10位字符串的组合即为一个数字,且10位的字符串有20位;一般来说int有4个字节,32位,即可以用于对应一个10位的字符串。例如

ACGTACGTAC -> 00011011000110110001

AAAAAAAAAA -> 00000000000000000000

20位的二进制数,至多有2^20种组合,因此hash table的大小为2^20,即1024 * 1024,将hash table设计为bool hashTable[1024 * 1024];

vector<string> findRepeatedDnaSequences(string s) {
int hashMap[1048576] = {0};
vector<string> ans;
int len = s.size(),hashNum = 0;
if (len < 11) return ans;
for (int i = 0;i < 9;++i)
hashNum = hashNum << 2 | (s[i] - 'A' + 1) % 5;
for (int i = 9;i < len;++i)
if (hashMap[hashNum = (hashNum << 2 | (s[i] - 'A' + 1) % 5) & 0xfffff]++ == 1)
ans.push_back(s.substr(i-9,10));
return ans;
}

  

LeetCode() Repeated DNA Sequences 看的非常的过瘾!的更多相关文章

  1. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. [Leetcode] Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  3. [LeetCode] Repeated DNA Sequences hash map

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)

    187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...

  5. lc面试准备:Repeated DNA Sequences

    1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  6. 【LeetCode】Repeated DNA Sequences 解题报告

    [题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  7. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. Leetcode:Repeated DNA Sequences详细题解

    题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  9. 【LeetCode】187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

随机推荐

  1. 将php网站移到CentOS 6.7上[一]:yum安装lamp环境

    最近应老师要求,将一个网站从51php上转移到学校提供的服务器上,之前对Linux没有了解,一切都在百度百度百度.于是发现很多步骤自己做过后就忘了,现将有效步骤记录下来,以供下次参考. 原51php上 ...

  2. 和Java相关的一些好文章(不定期更新)

    1.Java 集合类详解 (包括arraylist,linkedlist,vector,stack,hashmap,hashtable,treemap,collection等). 2.Java 理论与 ...

  3. js 获取iframe中的元素

    今天要修改编辑器插件中的元素遇到的问题 jquery 在父窗口中获取iframe中的元素 1.Js代码 格式:$("#iframe的ID").contents().find(&qu ...

  4. Centos 7环境下编译mysql 5.7

    首先在编译之前,我们要了解相关mysql 5.7的编译选项,官网编译选项地址:http://dev.mysql.com/doc/refman/5.7/en/source-configuration-o ...

  5. Deep Learning 16:用自编码器对数据进行降维_读论文“Reducing the Dimensionality of Data with Neural Networks”的笔记

    前言 论文“Reducing the Dimensionality of Data with Neural Networks”是深度学习鼻祖hinton于2006年发表于<SCIENCE > ...

  6. pfile 与 spfile

    启动方式与顺序: 启动顺序:dbs 下的 init --> dbs 下的 spfile 如果 pfile 中没有指定 spfile 参数,那么数据库以 pfile 方式启动 如果 pfile 中 ...

  7. JDBC连接执行MySQL存储过程报权限错误

    今天在测试项目的时候  突然就报了一个错出来. User does not have access to metadata required to determine stored procedure ...

  8. JMS消息中间件系列[ActiveMQ](一)

    版本5.13.3的特性: 1.Supports a variety of Cross Language Clients and Protocols from Java, C, C++, C#, Rub ...

  9. matlab实现感知机算法--统计学习小灶

    clear all; clc; %% %算法 %输入:训练数据集T = {(x1,y1),(x2,y2),...,(xn,yn)};学习率η %输出:w,b;感知机模型f(x) = sign(w*x+ ...

  10. Objective-C(内存管理)

    引用计数器 每个OC对象都有一个占4个字节存储空间的引用计数器 当使用或创建一个对象时,新对象的引用计数器默认是1 retain:可以使引用计数器+1 release:可以是引用计数器-1 retai ...