All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

问题:给定一个字符串序列,代表 DNA 序列,求其中有重复出现的长度为 10 的子序列。

题目中的例子都是不重叠的重复字串,实际上相互重叠的字串也是要统计进去,例如11位的 "AAAAAAAAAA" 就包含两个长度为 10 的"AAAAAAAAAA" 的重复子序列。这一点是题目没有说清楚的。

明确题目后,实现思路也比较简单:

  • 将 s 中所有长度为 10 的连续子字符串放入 map<string, int> ss_cnt 中,数各个连续字符串出现的的次数
  • 将 [0, 9] 视为窗口,将 ss_cnt 中窗口字符串对于的 value 减 1 ,然后判断 ss_cnt 中是否还存在一个 窗口字符串, 若存在则表示窗口字符串是重复的。
  • 将窗口向右移动一个,继续重复第二步,直至窗口移至最右端
     /**
* 重复子字符串 可以重叠。
*/
vector<string> findRepeatedDnaSequences(string s) {
unordered_set<string> res; unordered_map<string, int> ss_cnt; int len = ; for (int i = ; i + len - < s.size(); i++) {
string str = s.substr(i, len);
ss_cnt[str]++;
} int i = ;
while (i + len - < s.size()) { string cur = s.substr(i, len);
ss_cnt[cur]--; if (ss_cnt[cur] > ) {
res.insert(cur);
} ss_cnt[cur]++;
i++;
} vector<string> result; unordered_set<string>::iterator s_iter;
for (s_iter = res.begin(); s_iter != res.end(); s_iter++) {
result.push_back(*s_iter);
} return result;
}

[LeetCode] 187. Repeated DNA Sequences 解题思路的更多相关文章

  1. Java for LeetCode 187 Repeated DNA Sequences

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  2. 【LeetCode】187. Repeated DNA Sequences 解题报告(Python)

    作者: 负雪明烛 id: fuxuemingzhu 个人博客: http://fuxuemingzhu.cn/ 题目地址: https://leetcode.com/problems/repeated ...

  3. [LeetCode] 187. Repeated DNA Sequences 求重复的DNA序列

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  4. 【LeetCode】Repeated DNA Sequences 解题报告

    [题目] All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: &quo ...

  5. leetcode 187. Repeated DNA Sequences 求重复的DNA串 ---------- java

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

  6. [LeetCode#187]Repeated DNA Sequences

    Problem: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: ...

  7. [leetcode]187. Repeated DNA Sequences寻找DNA中重复出现的子串

    很重要的一道题 题型适合在面试的时候考 位操作和哈希表结合 public List<String> findRepeatedDnaSequences(String s) { /* 寻找出现 ...

  8. 【LeetCode】187. Repeated DNA Sequences

    题目: All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: " ...

  9. 【leetcode】Repeated DNA Sequences(middle)★

    All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACG ...

随机推荐

  1. Linux使用SSH远程连接方式和更改密码 ,自己空间转移过来的。

    一. 远程连接Linux系统需要一个方便的SSH连接工具putty就不错!工具在本日志的附件下载,解压密码是QQ号,云盘访问密码 90891.把远程主机ip和端口号填写上然后选择连接方式为“SSH”. ...

  2. 码表 ASCII Unicode GBK UTF-8

    2017-1-3 [ASCII]一个字节(7位,128个字符,2个16进制) 不包含中文 ASCII(American Standard Code for Information Interchang ...

  3. java 跳转地址栏地址改变

    在strtus1 中,很多都是直接的action 配置后进行跳转的 这样地址栏是不会改变的 如果需要进行浏览器跳转 ActionForward actionForward = new ActionFo ...

  4. 阿里云OSS存储开发(一)

    Step 1. 初始化一个OSSClient OSSClient是与OSS服务交互的客户端,SDK的OSS操作都是通过OSSClient完成的. 下面代码新建了一个OSSClient: using A ...

  5. 程序里面的system.out.println()输出到其他位置,不输出到tomcat控制台。

    设置startup.bat: call "%EXECUTABLE%" run %CMD_LINE_ARGS% >> ..\logs\kongzitai.txt 将sys ...

  6. ORACLE_DBA管理脚本

    SYS @ prod >col index_name for a10 SYS @ prod >col table_name for a10 SYS @ prod >col start ...

  7. python文件处理

    python中对文件处理需要涉及到os模块和shutil模块得到当前工作目录路径:os.getcwd()获取指定目录下的所有文件和目录名:os.listdir(dir)删除文件:os.remove(f ...

  8. jquery如何判断滚动条滚到页面底部并执行事件

    首先理解三个dom元素,分别是:clientHeight.offsetHeight.scrollTop. clientHeight:这个元素的高度,占用整个空间的高度,所以,如果一个div有滚动条,那 ...

  9. Sublime Text 3 中文汉化绿色破解特别版下载

    Sublime Text是一款代码编辑器,几乎支持所有语言的编写.sublime给人们的印象不外乎小巧.速度快.并且快捷键丰富而强大.不知繁多的插件. sublime一般被应用到前端的开发.Subli ...

  10. java 双重检查模式

    java 双重检查模式 在并发环境下 兼顾安全和效率 成例(Idiom)是一种代码层次上的模式,是在比设计模式的层次更具体的层次上的代码技巧.成例往往与编程语言密切相关.双重检查成例(Double C ...