Problem

common substring of a collection of strings is a substring of every member of the collection. We say that a common substring is a longest common substring if there does not exist a longer common substring. For example, "CG" is a common substring of "ACGTACGT" and "AACCGTATA", but it is not as long as possible; in this case, "CGTA" is a longest common substring of "ACGTACGT" and "AACCGTATA".

Note that the longest common substring is not necessarily unique; for a simple example, "AA" and "CC" are both longest common substrings of "AACC" and "CCAA".

Given: A collection of kk (k≤100k≤100) DNA strings of length at most 1 kbp each in FASTA format.

Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

Sample Dataset

>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA

Sample Output

AC

# 方法一
# coding=utf-8
'''
>Rosalind_1
GATTACA
>Rosalind_2
TAGACCA
>Rosalind_3
ATACA
''' def readfasta(filename, sample):
fa = open(filename, 'r')
fo = open(sample, 'w')
res = {}
rres = []
ID = ''
for line in fa:
if line.startswith('>'):
ID = line.strip('\n')
res[ID] = ''
else:
res[ID] += line.strip('\n') for key in res.values():
rres.append(key)
fo.write(key + '\n')
return rres def fragement(seq_list):
res = []
seq = seq_list[0]
for i in range(len(seq)):
s_seq = seq[i:]
#print s_seq
for j in range(len(s_seq)):
res.append(s_seq[:(len(s_seq) - j)])
#print res return res def main(infile, sample):
seq_list = readfasta(infile, sample) #['TAGACCA','ATACA','GATTACA']
frags = fragement(seq_list)
frags.sort(key=len, reverse=True) # 从长到短排列
for i in range(len(frags)):
ans = []
# s = 0
# m+=1
# print(m)
# res[frags[i]] = 0
for j in seq_list:
r = j.count(frags[i])
if r != 0:
ans.append(r)
if len(ans) >= len(seq_list):
print(frags[i])
break main('14.txt', 'sample.txt')

  方法二:(没看懂)

# coding=utf-8
'''
A solution to a ROSALIND bioinformatics problem.
Problem Title: Finding a Shared Motif
Rosalind ID: LCSM
Rosalind #: 014
URL: [url]http://rosalind.info/problems/lcsm/[/url]
''' def LongestSubstring(string_list):
'''Extracts all substrings from the first string in a list, and sends longest substring candidates to be checked.'''
longest = ''
for start_index in range(len(string_list[0])):
for end_index in range(len(string_list[0]), start_index, -1):
# Break if the length becomes too small, as it will only get smaller.
if end_index - start_index <= len(longest):
break
elif CheckSubstring(string_list[0][start_index:end_index], string_list):
longest = string_list[0][start_index:end_index] return longest def CheckSubstring(find_string, string_list):
'Checks if a given substring appears in all members of a given collection of strings and returns True/False.'
for string in string_list:
if (len(string) < len(find_string)) or (find_string not in string):
return False
return True seq = {}
seq_name = ''
with open('14.txt') as f:
for line in f:
if line[0] == '>':
seq_name = line.rstrip()
seq[seq_name] = ''
continue
else:
seq[seq_name] += (line.rstrip()).upper() print(seq) if __name__ == '__main__':
dna = []
for seq_name in seq:
dna.append(seq[seq_name]) lcsm = LongestSubstring(dna)
print(lcsm)
with open('014_LCSM.txt', 'w') as output_data:
output_data.write(lcsm)

  

14 Finding a Shared Motif的更多相关文章

  1. 16 Finding a Protein Motif

    Problem To allow for the presence of its varying forms, a protein motif is represented by a shorthan ...

  2. Oracle-buffer cache、shared pool

    http://blog.csdn.net/panfelix/article/details/38347059   buffer pool 和shared pool 详解 http://blog.csd ...

  3. Selenium Xpath Tutorials - Identifying xpath for element with examples to use in selenium

    Xpath in selenium is close to must required. XPath is element locator and you need to provide xpath ...

  4. VirtualBox中安装Ubuntu12.04/Ubuntu14.04虚拟机

    NOTE: 一开始安装的Ubuntu12.04,后来又重新安装了14.04.截图基本使用了安装12.04时的截图,后来安装14.04时又补充了几张.该安装过程对Ubuntu12.04和14.04都是适 ...

  5. 深入理解Windows X64调试

    随着64位操作系统的普及,都开始大力进军x64,X64下的调试机制也发生了改变,与x86相比,添加了许多自己的新特性,之前学习了Windows x64的调试机制,这里本着“拿来主义”的原则与大家分享. ...

  6. Vbox中Ubuntu的安装和共享文件夹设置

    1. 选择版本 1.1 Ubuntu桌面版与服务器版的区别 桌面版与服务器版,只要发布版本号一致,这两者从核心来说也就是相同的,唯一的差别在于它们的预期用途.桌面版面向个人电脑使用者,可以进行文字处理 ...

  7. http://wiki.apache.org/tomcat/HowTo

    http://wiki.apache.org/tomcat/HowTo Contents Meta How do I add a question to this page? How do I con ...

  8. linux下so动态库一些不为人知的秘密(中)

    上一篇(linux下so动态库一些不为人知的秘密(上))介绍了linux下so一些依赖问题,本篇将介绍linux的so路径搜索问题. 我们知道linux链接so有两种途径:显示和隐式.所谓显示就是程序 ...

  9. linux c++ 加载动态库常用的三种方法

    链接库时的搜索路径顺序:LD_LIBRARY_PATH --> /etc/ld.so.conf --> /lib,/usr/lib 方法1. vi .bash_profile    设置环 ...

随机推荐

  1. 【idea】如何安装jetty容器,并使用。

    参考:https://www.jetbrains.com/idea/help/run-debug-configuration-jetty-server.html背景:web开发当中,我觉得服务层的代码 ...

  2. Oracle按时间段分组统计

    想要按时间段分组查询,首先要了解level,connect by,oracle时间的加减. 关于level这里不多说,我只写出一个查询语句: ----level 是一个伪例 ---结果: 关于conn ...

  3. spring的定时任务配置(注解)

    参考博客: http://www.jb51.net/article/110541.htm http://blog.csdn.net/wxwzy738/article/details/25158787 ...

  4. 测试用例文件的存放和创建,对page objeck的理解

    如:(注意我下面这种要用eval函数取拼接的)

  5. C++ 构造函数_内存分区_对象初始化

    内存分区 栈区:int  x = 0:int  *p = NULL; 定义一个变量,定义一个指针时,会在栈区进行分配内存.分配的内存系统分配收回的,我们不用管. 堆区:int  *p = new  i ...

  6. Java-Runoob-高级教程-实例-环境设置实例:3.Java 实例 - 如何执行指定class文件目录(classpath)?

    ylbtech-Java-Runoob-高级教程-实例-环境设置实例:3.Java 实例 - 如何执行指定class文件目录(classpath)? 1.返回顶部 1. Java 实例 - 如何执行指 ...

  7. innodb引擎对自增字段(auto_increment)的处理

    原文地址:https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html#innodb-auto-increme ...

  8. PY安装模块

    Python安装失败原因 0环境 , pip版本一般为 7.x , 所以一般需要先升级pip版本 , 也就是执行 ```shellpython -m pip install --upgrade pip ...

  9. javascript精髓篇之原型链维护和继承.

    一.两个原型 很多人都知道javascript是原型继承,每个构造函数都有一个prototype成员,通过它就可以把javascript的继承演义的美轮美奂了. 其实啊,光靠这一个属性是无法完成jav ...

  10. Spring batch学习 (1)

    Spring Batch 批处理框架 埃森哲和Spring Source研发 主要解决批处理数据的问题,包含并行处理,事务处理机制等.具有健壮性 可扩展,和自带的监控功能,并且支持断点和重发.让程序员 ...