Problem

common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.

Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

Sample Dataset

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

Sample Output

AC

# 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')

  方法二:(没看懂)

# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)

  

14 Finding a Shared Motif的更多相关文章

  1. 16 Finding a Protein Motif

    Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...

  2. Oracle-buffer cache、shared pool

    http://blog.csdn.net/panfelix/article/details/38347059   buffer pool 和shared pool 详解 http://blog.csd ...

  3. Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium

    Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...

  4. VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机

    NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...

  5. 深入理解Windows X64调试

    随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...

  6. Vbox中Ubuntu的安装和共享文件夹设置

    1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...

  7. http://wiki.apache.org/tomcat/HowTo

    http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...

  8. linux下so动态库一些不为人知的秘密(中)

    上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...

  9. linux c++ 加载动态库常用的三种方法

    链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile    设置环 ...

随机推荐

  1. C#细说多线程(下)

    本文主要从线程的基础用法,CLR线程池当中工作者线程与I/O线程的开发,并行操作PLINQ等多个方面介绍多线程的开发. 其中委托的BeginInvoke方法以及回调函数最为常用.而 I/O线程可能容易 ...

  2. 【python】python实例集<二>

    ##扫描某个ip的端口号 # #-*- coding: utf-8 -*- # import socket # def main(): # sk = socket.socket(socket.AF_I ...

  3. vim配置之tagbar

    vimConfig/plugin/tagbar-setting.vim let g:tagbar_width=4 map <F12> :TagbarToggle<CR> map ...

  4. python + docker, 实现天气数据 从FTP获取以及持久化(二)-- python操作MySQL数据库

    前言 在这一节中,我们主要介绍如何使用python操作MySQL数据库. 准备 MySQL数据库使用的是上一节中的docker容器 “test-mysql”. Python 操作 MySQL 我们使用 ...

  5. javascript如何判断是手机还是电脑访问本网页

    var system ={}; var p = navigator.platform; system.win = p.indexOf("Win") == 0; system.mac ...

  6. 第10课 初探 Qt 中的消息处理

    1. Qt消息模型 (1)Qt封装了具体操作系统的消息机制 (2)Qt遵循经典的GUI消息驱动事件模型 2. 信号与槽 (1)Qt中定义了与系统消息相关的概念 ①信号(Signal):由操作系统产生的 ...

  7. python re示例

    #!/usr/bin/env python # encoding: utf-8 # Date: 2018/5/25import re s = '124311200111155214'ss = re.s ...

  8. django rest_framework 框架的使用

    django 的中间件 csrf Require a present and correct csrfmiddlewaretoken for POST requests that have a CSR ...

  9. requests and BeautifulSoup

    requests Python标准库中提供了:urllib.urllib2.httplib等模块以供Http请求,但是,它的 API 太渣了.它是为另一个时代.另一个互联网所创建的.它需要巨量的工作, ...

  10. Ubuntu无法sudo提权,报当前用户不在sudoers文件中错误

    Ubuntu安装后默认root不能登陆系统,密码也是随机生成,其他用户使用root权限,可以使用sudo提权,前提是该用户在/etc/sudoers配置列表中. 但是有时用户名从/etc/sudoer ...