Problem

common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.

Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

Sample Dataset

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

Sample Output

AC

# 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')

  方法二:(没看懂)

# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)

  

14 Finding a Shared Motif的更多相关文章

  1. 16 Finding a Protein Motif

    Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...

  2. Oracle-buffer cache、shared pool

    http://blog.csdn.net/panfelix/article/details/38347059   buffer pool 和shared pool 详解 http://blog.csd ...

  3. Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium

    Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...

  4. VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机

    NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...

  5. 深入理解Windows X64调试

    随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...

  6. Vbox中Ubuntu的安装和共享文件夹设置

    1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...

  7. http://wiki.apache.org/tomcat/HowTo

    http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...

  8. linux下so动态库一些不为人知的秘密(中)

    上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...

  9. linux c++ 加载动态库常用的三种方法

    链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile    设置环 ...

随机推荐

  1. macOS -- Mac系统如何编辑hosts文件

    Hosts是一个没有扩展名的系统文件,其作用就是将一些常用的网址域名与其对应的IP地址建立一个关联"数据库",当用户在浏览器中输入一个需要登录的网址时,系统会首先自动从Hosts文 ...

  2. FastAdmin Bootstrap-table 特定某行背景变红

    FastAdmin Bootstrap-table 特定某行背景变红 rowStyle: function (row, index) { var style = {css:{'background': ...

  3. 一步搞定私有Git服务器部署(Gogs)

    http://www.jianshu.com/p/424627516ef6 零.安装 Docker 和 Compsoe 首先安装 Docker: $ curl -sSL https://get.doc ...

  4. 黄聪:中国大陆的所有IP段,中国电信所有IP段、中国铁通所有IP段、中国网通所有IP段。

    中国大陆的所有IP段,中国电信所有IP段.中国铁通所有IP段.中国网通所有IP段. 中国大陆的所有IP段: 47.153.128.0 47.154.255.25558.14.0.0 58.25.255 ...

  5. 第十届蓝桥杯 试题 E: 迷宫

    试题 E: 迷宫 本题总分:15 分 [问题描述] 下图给出了一个迷宫的平面图,其中标记为 1 的为障碍,标记为 0 的为可 以通行的地方. 010000 000100 001001 110000 迷 ...

  6. linux centos 6.1 安装 redis

    1, yum install redis 检测是否有redis 2,没有的话就运行:yum install epel-release 3,再执行 yum install redis

  7. canvas绘制文本

    canvas绘制文本 属性和方法 font = value 设置字体 textAlign = value 设置字体对齐方式 start, end, left, right, center textBa ...

  8. vuex和vuejs

    前言:在最近学习 Vue.js 的时候,看到国外一篇讲述了如何使用 Vue.js 和 Vuex 来构建一个简单笔记的单页应用的文章.感觉收获挺多,自己在它的例子的基础上进行了一些优化和自定义功能,在这 ...

  9. GridView弹出对话框

    if (e.Row.RowState == DataControlRowState.Normal || e.Row.RowState == DataControlRowState.Alternate) ...

  10. leetcode654

    class Solution { public: TreeNode* constructMaximumBinaryTree(vector<int>& nums) { ) retur ...