Problem

common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.

Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

Sample Dataset

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

Sample Output

AC

# 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')

  方法二:(没看懂)

# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)

  

14 Finding a Shared Motif的更多相关文章

  1. 16 Finding a Protein Motif

    Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...

  2. Oracle-buffer cache、shared pool

    http://blog.csdn.net/panfelix/article/details/38347059   buffer pool 和shared pool 详解 http://blog.csd ...

  3. Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium

    Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...

  4. VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机

    NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...

  5. 深入理解Windows X64调试

    随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...

  6. Vbox中Ubuntu的安装和共享文件夹设置

    1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...

  7. http://wiki.apache.org/tomcat/HowTo

    http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...

  8. linux下so动态库一些不为人知的秘密(中)

    上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...

  9. linux c++ 加载动态库常用的三种方法

    链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile    设置环 ...

随机推荐

  1. 使用graphql-code-generator 生成graphql 代码

    类似的工具比较多,比如prisma .qloo.golang 的gqlgen.apollo-codegen graphql-code-generator 也是一个不错的工具(灵活.模版自定义...) ...

  2. 【Xamarin】Visual Studio 2013 Xamarin for Android开发环境搭建与配置&Genymotion

    Xamarin Xamarin是基于Mono的平台. Xamarin旨在让开发者可以用C#编写iOS, Android, Mac应用程序,也就是跨平台移动开发. 下载资源 1,进入Xamarin官方网 ...

  3. 【转】Linux 图形界面与命令行模式切换

    原文网址:http://blog.csdn.net/ldl22847/article/details/7600368 Tip:使用环境VMware Workstation    OS:CentOS 6 ...

  4. 玩转Panabit 2008 Live CD到U盘,Panabit随身行

    直接将Panabit 2008 Live CD的内容复制到U盘,即把Live CD转成U盘启动盘,U盘可写,方便保存配置.要能正常使用,必须机器支持USB启动,才能用上:但是熟悉此方法,同样可以灵活用 ...

  5. sql server常用日期格式化

    /*8 24 108 - hh:mm:ss */ Select CONVERT(varchar(), GETDATE(), )-- :: Select CONVERT(varchar(), GETDA ...

  6. [正经分析] DAG上dp两种做法的区别——拓扑序与SPFA

    在下最近刷了几道DAG图上dp的题目. 要提到的第一道是NOIP原题<最优贸易>.这是一个缩点后带点权的DAG上dp,它同时规定了起点和终点. 第二道是洛谷上的NOI导刊题目<最长路 ...

  7. MU-MIMO学习

    设备的认证查询 https://www.wi-fi.org/product-finder 1. MU-MIMO不会增大无线的最大速度 2. MU-MIMO需要路由器和客户端同时支持 3. 当你没有或者 ...

  8. php调用dll经验小结

    最近做一个网站,需要频繁使用远程数据,数据接口已经做好.在做转换的时候遇到了性能上的问题:开始打算用php来实现转换,苦苦查了数天,都没有找到直接操作字节的方法.虽然可以使用 pack() 方法将各个 ...

  9. TreeSet函数

    TreeSet类的排序问题   TreeSet支持两种排序方法:自然排序和定制排序.TreeSet默认采用自然排序. 1.自然排序 TreeSet会调用集合元素的compareTo(Object ob ...

  10. C++中如何强制inline函数(MSVC, GCC)

    #ifdef _MSC_VER_ // for MSVC #define forceinline __forceinline #elif defined __GNUC__ // for gcc on ...