All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

非常好的思路: 转换成位操作。

算法分析

首先考虑将ACGT进行二进制编码

A -> 00

C -> 01

G -> 10

T -> 11

在编码的情况下,每10位字符串的组合即为一个数字,且10位的字符串有20位;一般来说int有4个字节,32位,即可以用于对应一个10位的字符串。例如

ACGTACGTAC -> 00011011000110110001

AAAAAAAAAA -> 00000000000000000000

20位的二进制数,至多有2^20种组合,因此hash table的大小为2^20,即1024 * 1024,将hash table设计为bool hashTable[1024 * 1024];

vector<string> findRepeatedDnaSequences(string s) {
int hashMap[1048576] = {0};
vector<string> ans;
int len = s.size(),hashNum = 0;
if (len < 11) return ans;
for (int i = 0;i < 9;++i)
hashNum = hashNum << 2 | (s[i] - 'A' + 1) % 5;
for (int i = 9;i < len;++i)
if (hashMap[hashNum = (hashNum << 2 | (s[i] - 'A' + 1) % 5) & 0xfffff]++ == 1)
ans.push_back(s.substr(i-9,10));
return ans;
}

  

LeetCode() Repeated DNA Sequences 看的非常的过瘾!的更多相关文章

  1. [LeetCode] Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. [Leetcode] Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  3. [LeetCode] Repeated DNA Sequences hash map

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. LeetCode 187. 重复的DNA序列(Repeated DNA Sequences)

    187. 重复的DNA序列 187. Repeated DNA Sequences 题目描述 All DNA is composed of a series of nucleotides abbrev ...

  5. lc面试准备:Repeated DNA Sequences

    1 题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  6. 【LeetCode】Repeated DNA Sequences 解题报告

    [题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  7. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  8. Leetcode:Repeated DNA Sequences详细题解

    题目 All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  9. 【LeetCode】187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

随机推荐

  1. C#综合笔记

    AspNetPager分页控件 UrlPaging="true" 利用get方式page?=1进行分页. UrlPaging="false"利用post方式进行 ...

  2. 【CSU1808】地铁

    ICPCCamp 有 n 个地铁站,用 1,2,-,n 编号. m 段双向的地铁线路连接 n 个地铁站,其中第 i 段地铁属于 ci 号线,位于站 ai,bi 之间,往返均需要花费 ti 分钟(即从 ...

  3. [hadoop] 一些基础概念

    一.云的概念 1.云计算的概念 随时 随地 使用任何设备 获得任何服务 2.趋势 )资料开始回归集中处理(存储大量资料) 随时存取 降低遗失风险 减少传输成本 促进团队协作 )网页变为预设开发平台(网 ...

  4. absolute和fixed

    共同点: 改变行内元素的呈现方式,display设置为block:让元素脱离文档流,不占据空间:默认会覆盖到非定位元素上. 不同点: absolute的根元素是相对于static定位以外的第一个父元素 ...

  5. wifidog 配置中文说明

    #网关IDGatewayID default#外部网卡ExternalInterface eth0#无线网卡GatewayInterface eth0#无线IPGatewayAddress 192.1 ...

  6. Js获取后台集合List的值并操作html

    功能:将后台传到前端JSP的List中的float型数值转换为百分比显示 HTML代码: <s:iterator value="colorConfigList" status ...

  7. Keeplived配置Nginx双机高可用

    一.简介不管是Keepalived还是Heartbeat做高可用,其高可用,都是站在服务器脚本去说的高可用,而不是服务的角度.也就是说,如果服务器DOWN机或者网络出现故障,高可用是可以实现自动切换的 ...

  8. Android开发--Layout元素

    1.TF类型 android:layout_alignParentStart="true/flase":贴紧父元素的左边缘 android:layout_alignParentEn ...

  9. BZOJ2480 Spoj3105 Mod

    乍一看题面:$$a^x \equiv b \ (mod \ m)$$ 是一道BSGS,但是很可惜$m$不是质数,而且$(m, a) \not= 1$,这个叫扩展BSGS[额...... 于是我们需要通 ...

  10. docker初学笔记

    什么是docker 不准确的说,docker是一种轻量级的虚拟机,它把可执行文件和运行环境打包成一个image,放在容器里运行,但是启动速度比虚拟机快很多,资源消耗小.这种技术主要是为了解决部署环境的 ...