Problem

common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.

Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

Sample Dataset

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

Sample Output

AC

# 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')

  方法二:(没看懂)

# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)

  

14 Finding a Shared Motif的更多相关文章

  1. 16 Finding a Protein Motif

    Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...

  2. Oracle-buffer cache、shared pool

    http://blog.csdn.net/panfelix/article/details/38347059   buffer pool 和shared pool 详解 http://blog.csd ...

  3. Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium

    Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...

  4. VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机

    NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...

  5. 深入理解Windows X64调试

    随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...

  6. Vbox中Ubuntu的安装和共享文件夹设置

    1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...

  7. http://wiki.apache.org/tomcat/HowTo

    http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...

  8. linux下so动态库一些不为人知的秘密(中)

    上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...

  9. linux c++ 加载动态库常用的三种方法

    链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile    设置环 ...

随机推荐

  1. Android 查看Android版本的方法

    1.通过源码查看 Android 版本 路径:build/core/version_defaults.mk PLATFORM_VERSION := 2.通过编译时终端输出查看 ============ ...

  2. nginx 自签名证书 配置 https

    最近在研究nginx,整好遇到一个需求就是希望服务器与客户端之间传输内容是加密的,防止中间监听泄露信息,但是去证书服务商那边申请证书又不合算,因为访问服务器的都是内部人士,所以自己给自己颁发证书,忽略 ...

  3. mamp下安装ruby的mysql库

    mysql2库死活不行,用ruby-mysql得了,纯ruby的库 gem "ruby-mysql" require 'mysql'

  4. 【升级至sql 2012】sqlserver mdf向上兼容附加数据库(无法打开数据库 'xxxxx' 版本 611。请将该数据库升级为最新版本。)

    sqlserver mdf向上兼容附加数据库(无法打开数据库 'xxxxx' 版本 611.请将该数据库升级为最新版本.) 最近工作中有一个sqlserver2005版本的mdf文件,还没有log文件 ...

  5. IDEA试用期结束以后继续试用(全部失效就更新),IDEA 2018 LICENSE SERVER

    IDEA是一款收费的IDE,但是新用户可以免费试用一段时间,试用期结束可以购买,也可以通过填写License server address来继续使用. 打开IDEA以后,通过Help ----- Re ...

  6. 5月31日上课笔记-Mysql简介

    一.mysql 配置mysql环境变量 path中添加 D:\Program Files\MySQL\MySQL Server 5.7\bin cmd命令: 登录:mysql -uroot -p 退出 ...

  7. 第5课 Qt Creator工程介绍

    1. QT Creator工程管理(一个工程包含不同类型的文件) (1).pro项目文件 (2).pro.user用户配置描述文件 (3).h头文件 (4).cpp源文件 (5).ui界面描述文件 ( ...

  8. 精《Linux内核精髓:精通Linux内核必会的75个绝技》一HACK #4 如何使用Git

    HACK #4 如何使用Git 本节介绍Git的使用方法.Git是Linux内核等众多OSS(Open Source Software,开源软件)开发中所使用的SCM(Source Code Mana ...

  9. Vue基础知识之过滤器(四)

    过滤器 1.过滤器的用法,用 '|' 分割表达式和过滤器. 例如:{{ msg | filter}} {{msg | filter(a)}} a就标识filter的一个参数. 用两个过滤器:{{msg ...

  10. Python web框架——Tornado

    Tornado是一个Python Web框架和异步网络库,最初由FriendFeed开发.通过使用非阻塞网络I / O,Tornado可以扩展到数万个开放连接,使其成为需要长时间连接每个用户的长轮询, ...