14 Finding a Shared Motif
Problem
A common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".
Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".
Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.
Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)
Sample Dataset
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
Sample Output
AC # 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')
方法二:(没看懂)
# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)
14 Finding a Shared Motif的更多相关文章
- 16 Finding a Protein Motif
Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...
- Oracle-buffer cache、shared pool
http://blog.csdn.net/panfelix/article/details/38347059 buffer pool 和shared pool 详解 http://blog.csd ...
- Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium
Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...
- VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机
NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...
- 深入理解Windows X64调试
随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...
- Vbox中Ubuntu的安装和共享文件夹设置
1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...
- http://wiki.apache.org/tomcat/HowTo
http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...
- linux下so动态库一些不为人知的秘密(中)
上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...
- linux c++ 加载动态库常用的三种方法
链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile 设置环 ...
随机推荐
- Android 查看Android版本的方法
1.通过源码查看 Android 版本 路径:build/core/version_defaults.mk PLATFORM_VERSION := 2.通过编译时终端输出查看 ============ ...
- nginx 自签名证书 配置 https
最近在研究nginx,整好遇到一个需求就是希望服务器与客户端之间传输内容是加密的,防止中间监听泄露信息,但是去证书服务商那边申请证书又不合算,因为访问服务器的都是内部人士,所以自己给自己颁发证书,忽略 ...
- mamp下安装ruby的mysql库
mysql2库死活不行,用ruby-mysql得了,纯ruby的库 gem "ruby-mysql" require 'mysql'
- 【升级至sql 2012】sqlserver mdf向上兼容附加数据库(无法打开数据库 'xxxxx' 版本 611。请将该数据库升级为最新版本。)
sqlserver mdf向上兼容附加数据库(无法打开数据库 'xxxxx' 版本 611.请将该数据库升级为最新版本.) 最近工作中有一个sqlserver2005版本的mdf文件,还没有log文件 ...
- IDEA试用期结束以后继续试用(全部失效就更新),IDEA 2018 LICENSE SERVER
IDEA是一款收费的IDE,但是新用户可以免费试用一段时间,试用期结束可以购买,也可以通过填写License server address来继续使用. 打开IDEA以后,通过Help ----- Re ...
- 5月31日上课笔记-Mysql简介
一.mysql 配置mysql环境变量 path中添加 D:\Program Files\MySQL\MySQL Server 5.7\bin cmd命令: 登录:mysql -uroot -p 退出 ...
- 第5课 Qt Creator工程介绍
1. QT Creator工程管理(一个工程包含不同类型的文件) (1).pro项目文件 (2).pro.user用户配置描述文件 (3).h头文件 (4).cpp源文件 (5).ui界面描述文件 ( ...
- 精《Linux内核精髓:精通Linux内核必会的75个绝技》一HACK #4 如何使用Git
HACK #4 如何使用Git 本节介绍Git的使用方法.Git是Linux内核等众多OSS(Open Source Software,开源软件)开发中所使用的SCM(Source Code Mana ...
- Vue基础知识之过滤器(四)
过滤器 1.过滤器的用法,用 '|' 分割表达式和过滤器. 例如:{{ msg | filter}} {{msg | filter(a)}} a就标识filter的一个参数. 用两个过滤器:{{msg ...
- Python web框架——Tornado
Tornado是一个Python Web框架和异步网络库,最初由FriendFeed开发.通过使用非阻塞网络I / O,Tornado可以扩展到数万个开放连接,使其成为需要长时间连接每个用户的长轮询, ...