14 Finding a Shared Motif
Problem
A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".
Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".
Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.
Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)
Sample Dataset
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
Sample Output
AC # 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')
方法二:(没看懂)
# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)
14 Finding a Shared Motif的更多相关文章
- 16 Finding a Protein Motif
Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...
- Oracle-buffer cache、shared pool
http://blog.csdn.net/panfelix/article/details/38347059 buffer pool 和shared pool 详解 http://blog.csd ...
- Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium
Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...
- VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机
NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...
- 深入理解Windows X64调试
随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...
- Vbox中Ubuntu的安装和共享文件夹设置
1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...
- http://wiki.apache.org/tomcat/HowTo
http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...
- linux下so动态库一些不为人知的秘密(中)
上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...
- linux c++ 加载动态库常用的三种方法
链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile 设置环 ...
随机推荐
- macOS -- Mac系统如何编辑hosts文件
Hosts是一个没有扩展名的系统文件,其作用就是将一些常用的网址域名与其对应的IP地址建立一个关联"数据库",当用户在浏览器中输入一个需要登录的网址时,系统会首先自动从Hosts文 ...
- FastAdmin Bootstrap-table 特定某行背景变红
FastAdmin Bootstrap-table 特定某行背景变红 rowStyle: function (row, index) { var style = {css:{'background': ...
- 一步搞定私有Git服务器部署(Gogs)
http://www.jianshu.com/p/424627516ef6 零.安装 Docker 和 Compsoe 首先安装 Docker: $ curl -sSL https://get.doc ...
- 黄聪:中国大陆的所有IP段,中国电信所有IP段、中国铁通所有IP段、中国网通所有IP段。
中国大陆的所有IP段,中国电信所有IP段.中国铁通所有IP段.中国网通所有IP段. 中国大陆的所有IP段: 47.153.128.0 47.154.255.25558.14.0.0 58.25.255 ...
- 第十届蓝桥杯 试题 E: 迷宫
试题 E: 迷宫 本题总分:15 分 [问题描述] 下图给出了一个迷宫的平面图,其中标记为 1 的为障碍,标记为 0 的为可 以通行的地方. 010000 000100 001001 110000 迷 ...
- linux centos 6.1 安装 redis
1, yum install redis 检测是否有redis 2,没有的话就运行:yum install epel-release 3,再执行 yum install redis
- canvas绘制文本
canvas绘制文本 属性和方法 font = value 设置字体 textAlign = value 设置字体对齐方式 start, end, left, right, center textBa ...
- vuex和vuejs
前言:在最近学习 Vue.js 的时候,看到国外一篇讲述了如何使用 Vue.js 和 Vuex 来构建一个简单笔记的单页应用的文章.感觉收获挺多,自己在它的例子的基础上进行了一些优化和自定义功能,在这 ...
- GridView弹出对话框
if (e.Row.RowState == DataControlRowState.Normal || e.Row.RowState == DataControlRowState.Alternate) ...
- leetcode654
class Solution { public: TreeNode* constructMaximumBinaryTree(vector<int>& nums) { ) retur ...