14 Finding a Shared Motif
Problem
A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".
Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".
Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.
Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)
Sample Dataset
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
Sample Output
AC # 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')
方法二:(没看懂)
# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)
14 Finding a Shared Motif的更多相关文章
- 16 Finding a Protein Motif
Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...
- Oracle-buffer cache、shared pool
http://blog.csdn.net/panfelix/article/details/38347059 buffer pool 和shared pool 详解 http://blog.csd ...
- Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium
Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...
- VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机
NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...
- 深入理解Windows X64调试
随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...
- Vbox中Ubuntu的安装和共享文件夹设置
1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...
- http://wiki.apache.org/tomcat/HowTo
http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...
- linux下so动态库一些不为人知的秘密(中)
上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...
- linux c++ 加载动态库常用的三种方法
链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile 设置环 ...
随机推荐
- TCP建立连接的三次握手和TCP连接断开的四次挥手
1. TCP建立连接的3次握手 2. TCP断开连接的四次挥手 [注意]中断连接端可以是Client端,也可以是Server端. 图3—Client端主动发起关闭连接请求 1. 假设Client端主动 ...
- POJ3259 Wormholes 【Bellmanford推断是否存在负回路】
版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/u011775691/article/details/27612757 非常easy的bellmanf ...
- RabbitMq + Spring 实现ACK机制
概念性解读(Ack的灵活) 首先啊,有的人不是太理解这个Ack是什么,讲的接地气一点,其实就是一个通知,怎么说呢,当我监听消费者,正常情况下,不会出异常,但是如果是出现了异常,甚至是没有获取的异常,那 ...
- phpstorm 光标设置
1.是否允许光标定位在行尾之后的任意位置Allow placement of caret after end of line 这个设置是上下换行的时候,光标可以定位在任意位置,只能通过方向键移动光标, ...
- CentOS+Apache虚拟主机域名设置
首先注释掉 DocumentRoot /var/www/html <virtualhost 192.168.1.105> DocumentRoot /home/wxwb ...
- iPhone应用提交流程:如何将App程序发布到App Store
http://www.techolics.com/apple/20120401_197.html 对于刚加入iOS应用开发行列的开发者来说,终于经过艰苦的Coding后完成了第一个应用后最重要的历史时 ...
- How to Configure Eclipse for Python --- 在eclipse中如何配置pydev
From: http://www.rose-hulman.edu/class/csse/resources/Eclipse/eclipse-python-configuration.htm Pytho ...
- c# 设置自动隐藏任务栏、获取状态
from: http://stackoverflow.com/questions/1381821/how-to-toggle-switch-windows-taskbar-from-show-to-a ...
- NOIP2013 Day2
1.积木大赛 https://www.luogu.org/problemnew/show/1969 这道题在考试时暴力得比较麻烦,导致只得了80分,t了两个点. 思路为寻找一个区间内高度大于0的最低点 ...
- postman-2get发送请求
文档地址:https://www.v2ex.com/p/7v9TEc53 第一个API请求 最热主题 相当于首页右侧的 10 大每天的内容. https://www.v2ex.com/api/topi ...