Problem

Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous collection of symbols in ss (as a result, tt must be no longer than ss).

The position of a symbol in a string is the total number of symbols found to its left, including itself (e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18). The symbol at position ii of ss is denoted by s[i]s[i].

A substring of ss can be represented as s[j:k]s[j:k], where jj and kk represent the starting and ending positions of the substring in ss; for example, if ss = "AUGCUUCAGAAAGGUCUUACG", then s[2:5]s[2:5] = "UGCU".

The location of a substring s[j:k]s[j:k] is its beginning position jj; note that tt will have multiple locations in ss if it occurs more than once as a substring of ss (see the Sample below).

Given: Two DNA strings ss and tt (each of length at most 1 kbp).

Return: All locations of tt as a substring of ss.

Sample Dataset

GATATATGCATATACTT
ATAT

Sample Output

2 4 10

#-*-coding:UTF-8-*-
### 9. Finding a Motif in DNA ### # Method 1: Use Module regex.finditer
import regex
# 比re更强大的模块 matches = regex.finditer('ATAT', 'GATATATGCATATACTT', overlapped=True)
# 返回所有匹配项,
for match in matches:
print (match.start() + 1) # Method 2: Brute Force Search
seq = 'GATATATGCATATACTT'
pattern = 'ATAT' def find_motif(seq, pattern):
position = []
for i in range(len(seq) - len(pattern)):
if seq[i:i + len(pattern)] == pattern:
position.append(str(i + 1)) print ('\t'.join(position)) find_motif(seq, pattern) # methond 3
import re
seq='GATATATGCATATACTT'
print [i.start()+1 for i in re.finditer('(?=ATAT)',seq)]
# ?= 之后字符串内容需要匹配表达式才能成功匹配。

  


09 Finding a Motif in DNA的更多相关文章

  1. Hash function

    Hash function From Wikipedia, the free encyclopedia   A hash function that maps names to integers fr ...

  2. Windows7WithSP1/TeamFoundationServer2012update4/SQLServer2012

    [Info   @09:03:33.737] ====================================================================[Info   @ ...

  3. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  4. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  5. DNA binding motif比对算法

    DNA binding motif比对算法 2012-08-31 ~ ADMIN 之前介绍了序列比对的一些算法.本节主要讲述motif(有人翻译成结构模式,但本文一律使用基模)的比对算法. 那么什么是 ...

  6. 16 Finding a Protein Motif

    Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...

  7. hdu 1560 DNA sequence(搜索)

    http://acm.hdu.edu.cn/showproblem.php?pid=1560 DNA sequence Time Limit: 15000/5000 MS (Java/Others)  ...

  8. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

  9. hdu 1560 DNA sequence(迭代加深搜索)

    DNA sequence Time Limit : 15000/5000ms (Java/Other)   Memory Limit : 32768/32768K (Java/Other) Total ...

随机推荐

  1. 转 GraphQL Schema Stitching explained: Schema Delegation

    转自官方文档 In the last article, we discussed the ins and outs of remote (executable) schemas. These remo ...

  2. 结构化日志类库 ---- Serilog库

    在过去的几年中,结构化日志已经大受欢迎.而Serilog是 .NET 中最著名的结构化日志类库 ,我们提供了这份的精简指南来帮助你快速了解并运用它. 0. 内容 设定目标 认识Serilog 事件和级 ...

  3. elasticsearch安装入门

    简介Elasticsearch是一个高度可扩展的开源的分布式Restful全文搜索和分析引擎. 它允许用户快速的( 近实时的) 存储. 搜索和分析海量数据. 它通常用作底层引擎技术, 为具有复杂搜索功 ...

  4. centos7 安装配置rsyslog + LogAnalyzer + mysql

    https://www.cnblogs.com/mchina/p/linux-centos-rsyslog-loganalyzer-mysql-log-server.html 安装LNMP 一键安装包 ...

  5. java wab----遇到经常用到集合list/map/

    List与Vector的区别: vector适用:对象数量变化少,简单对象,随机访问元素频繁 list适用:对象数量变化大,对象复杂,插入和删除频] List首先是链表,它的元素不是连续的.vecto ...

  6. 浏览器禁用Cookie

    做JavaWeb的都知道Session的底层是使用Cookie来实现的,服务器端会在本地文件中保存session信息,并将sessionID发给客户端(浏览器),浏览器就会把这个sessionID(准 ...

  7. SqlServer快速获得表总记录数(大数据量)

    --第1种 执行全表扫描才能获得行数 SELECT count(*) FROM BUS_tb_UserGradePrice --第2种 执行扫描全表id不为空的,获得行数 select count(u ...

  8. 像素(PX)转其它长度单位(mm、cm...)

    今天想把px转成mm为单位,因像素跟其它单位比值的大小会跟屏幕设置的分辨率大小而不定,因此不能以固定的数值去计算. 解决方法是 页面上放一个高度为1mm的隐藏块 <div id="di ...

  9. 安装MySQL时出现黄色感叹号,提示3306已被占用

    windows系统如何查看现在某个端口的应用进程id呢,命令是: 1.netstat  -aon|findstr 3306 2.最后的那个数值就是进程id号,此时需要查看该id号对应的应用是哪一个,可 ...

  10. 基于Windows 配置 nginx 集群 & 反向代理

    1.下载 nginx 下载页面 : http://nginx.org/en/download.html 具体文件: http://nginx.org/download/nginx-1.7.0.zip ...