Problem

A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.

directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail vv and head ww is represented by (v,w)(v,w) (but not by (w,v)(w,v)). A directed loop is a directed edge of the form (v,v)(v,v).

For a collection of strings and a positive integer kk, the overlap graph for the strings is a directed graph OkOk in which each string is represented by a node, and string ss is connected to string ttwith a directed edge when there is a length kk suffix of ss that matches a length kk prefix of tt, as long as s≠ts≠t; we demand s≠ts≠t to prevent directed loops in the overlap graph (although directed cycles may be present).

Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.

Return: The adjacency list corresponding to O3O3. You may return edges in any order.

Sample Dataset

>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG

Sample Output

Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323 方法一
# coding=utf-8

# method1
data ={'Rosalind_0442': 'AAATCCC',
'Rosalind_0498': 'AAATAAA',
'Rosalind_2323': 'TTTTCCC',
'Rosalind_2391': 'AAATTTT',
'Rosalind_5013': 'GGGTGGG'} def is_k_overlap(s1, s2, k):
return s1[-k:] == s2[:k] import itertools
def k_edges(data, k):
edges = []
for u,v in itertools.combinations(data, 2): # data 里面任意取两个比较
u_dna, v_dna = data[u], data[v]
print u_dna, v_dna
if is_k_overlap(u_dna, v_dna, k):
edges.append((u,v)) if is_k_overlap(v_dna, u_dna, k):
edges.append((v,u)) return edges print k_edges(data, 3)

  方法二:

# coding=utf-8
### 12. Overlap Graphs ###
from collections import OrderedDict
import re def overlap_graph(dna, n):
edges = []
for ke1, val1 in dna:
for ke2, val2 in dna:
if ke1 != ke2 and val1[-n:] == val2[:n]:
edges.append(ke1 + '\t' + ke2)
return edges dna = OrderedDict()
with open('12.txt') as f:
for line in f:
line = line.rstrip()
if line.startswith('>'):
seqName = re.sub('>', '', line)
dna[seqName] = ''
continue
dna[seqName] += line.upper() fh = open('rosalind_grph_output.txt', 'wt')
for x in overlap_graph(dna.items(), 3):
fh.write(x + '\n') fh.close()

  方法三

# coding=utf-8
seq_list = []
stseq = ''
for line in open('12.txt'):
if line[0] == '>':
if stseq != '':
seq_list.append([stname, stseq])
stseq = ''
stname = line[1:-1]
else:
stseq = stseq + line.strip('\n')
seq_list.append([stname, stseq])
l = len(seq_list) for i in range(0, l):
for j in range(0, i):
if seq_list[i][1] == seq_list[j][1]:
continue
if seq_list[i][1][0:3] == seq_list[j][1][-3:]:
print seq_list[j][0], seq_list[i][0]
if seq_list[i][1][-3:] == seq_list[j][1][0:3]:
print seq_list[i][0], seq_list[j][0]

  

12 Overlap Graphs的更多相关文章

  1. Mathematics for Computer Science (Eric Lehman / F Thomson Leighton / Albert R Meyer 著)

    I Proofs1 What is a Proof?2 The Well Ordering Principle3 Logical Formulas4 Mathematical Data Types5 ...

  2. guava之cache

    转自:http://ifeve.com/google-guava-cachesexplained/ 范例 01 LoadingCache<Key, Graph> graphs = Cach ...

  3. [Google Guava] 3-缓存

    原文地址  译文地址    译者:许巧辉  校对:沈义扬 范例 01 LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder() ...

  4. 迄今为止最硬核的「Java8时间系统」设计原理与使用方法

    为了使本篇文章更容易让读者读懂,我特意写了上一篇<任何人都需要知道的「世界时间系统」构成原理,尤其开发人员>的科普文章.本文才是重点,绝对要读,走起! Java平台时间系统的设计方案 几乎 ...

  5. python 各模块

    01 关于本书 02 代码约定 03 关于例子 04 如何联系我们 1 核心模块 11 介绍 111 内建函数和异常 112 操作系统接口模块 113 类型支持模块 114 正则表达式 115 语言支 ...

  6. Python Standard Library

    Python Standard Library "We'd like to pretend that 'Fredrik' is a role, but even hundreds of vo ...

  7. 在mybatis中写sql语句的一些体会

    本文会使用一个案例,就mybatis的一些基础语法进行讲解.案例中使用到的数据库表和对象如下: article表:这个表存放的是文章的基础信息 -- ------------------------- ...

  8. 剖析虚幻渲染体系(12)- 移动端专题Part 2(GPU架构和机制)

    目录 12.4 移动渲染技术要点 12.4.1 Tile-based (Deferred) Rendering 12.4.2 Hierarchical Tiling 12.4.3 Early-Z 12 ...

  9. The Daligner Overlap Library

    /************************************************************************************\ * * * Copyrig ...

随机推荐

  1. Oracle Client 10g (instantclient) 精简版安装

    今天遇到个软件要求安装oracle client端,于是考虑装精简版本的,就从http://www.oracle.com/technology/software/tech/oci/instantcli ...

  2. [原创]JEECMS 自定义标签调用广告版位下的所有广告(利用广告管理管理首页幻灯片)

    JEECMS自带的只有[@cms_advertising]标签,并且官方没有给文档,用法: [@cms_advertising id='3']             <img src=&quo ...

  3. 试玩mpvue,用vue的开发模式开发微信小程序

    mpvue,美团开源的vue文件转换成小程序的文件格式,今天玩了一下练练手 mpvue文档地址: http://mpvue.com/mpvue/#_1 暂时有几个点需要注意的: 1.新增页面需要重新启 ...

  4. 【monkeyrunner】浅谈包名和activity名

    概念理解 包名:顾名思义,包名即为程序app的包名. activity名:每个界面都是一个activity. 两者关系:一个包有多个activity. Monkeyrunner中 device.sta ...

  5. C++ 数据的封装 初始封装

    C++ 数据封装 所有的 C++ 程序都有以下两个基本要素: 程序语句(代码):这是程序中执行动作的部分,它们被称为函数. 程序数据:数据是程序的信息,会受到程序函数的影响. 封装是面向对象编程中的把 ...

  6. [转][Java]简单标签库简介

    public class SimpleTagDemo extends SimpleTagSupport { @Override public void doTag() throws JspExcept ...

  7. Jenkins构建Python项目提示:'python' 不是内部或外部命令,也不是可运行的程序

    问题描述: jenkin集成python项目,立即构建后,发现未执行成功,查看Console Output 提示:'Python' 不是内部或外部命令,也不是可运行的程序,如下图: 1.在 Windo ...

  8. javascript如何判断是手机还是电脑访问本网页

    var system ={}; var p = navigator.platform; system.win = p.indexOf("Win") == 0; system.mac ...

  9. 存在继承关系的Java类对象之间的类型转换(一)

      类似于基本数据类型之间的强制类型转换. 存在继承关系的父类对象和子类对象之间也可以 在一定条件之下相互转换. 这种转换需要遵守以下原则: 1.子类对象可以被视为是其父类的一个对象2.父类对象不能被 ...

  10. Nginx实战入门教程

    Nginx 简介 Nginx是一个高性能的http和反向代理服务器,它看起来好像不太符合英文单词的拼写习惯,因为Nginx是由名为 伊戈尔·赛索耶夫 的俄罗斯人开发的.Nginx主要特点为占用内存小, ...