12 Overlap Graphs
Problem
A graph whose nodes have all been labeled can be represented by an adjacency list, in which each row of the list contains the two node labels corresponding to a unique edge.
A directed graph (or digraph) is a graph containing directed edges, each of which has an orientation. That is, a directed edge is represented by an arrow instead of a line segment; the starting and ending nodes of an edge form its tail and head, respectively. The directed edge with tail vv and head ww is represented by (v,w)(v,w) (but not by (w,v)(w,v)). A directed loop is a directed edge of the form (v,v)(v,v).
For a collection of strings and a positive integer kk, the overlap graph for the strings is a directed graph OkOk in which each string is represented by a node, and string ss is connected to string ttwith a directed edge when there is a length kk suffix of ss that matches a length kk prefix of tt, as long as s≠ts≠t; we demand s≠ts≠t to prevent directed loops in the overlap graph (although directed cycles may be present).
Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.
Return: The adjacency list corresponding to O3O3. You may return edges in any order.
Sample Dataset
>Rosalind_0498
AAATAAA
>Rosalind_2391
AAATTTT
>Rosalind_2323
TTTTCCC
>Rosalind_0442
AAATCCC
>Rosalind_5013
GGGTGGG
Sample Output
Rosalind_0498 Rosalind_2391
Rosalind_0498 Rosalind_0442
Rosalind_2391 Rosalind_2323 方法一
# coding=utf-8 # method1
data ={'Rosalind_0442': 'AAATCCC',
'Rosalind_0498': 'AAATAAA',
'Rosalind_2323': 'TTTTCCC',
'Rosalind_2391': 'AAATTTT',
'Rosalind_5013': 'GGGTGGG'} def is_k_overlap(s1, s2, k):
return s1[-k:] == s2[:k] import itertools
def k_edges(data, k):
edges = []
for u,v in itertools.combinations(data, 2): # data 里面任意取两个比较
u_dna, v_dna = data[u], data[v]
print u_dna, v_dna
if is_k_overlap(u_dna, v_dna, k):
edges.append((u,v)) if is_k_overlap(v_dna, u_dna, k):
edges.append((v,u)) return edges print k_edges(data, 3)
方法二:
# coding=utf-8
### 12. Overlap Graphs ###
from collections import OrderedDict
import re def overlap_graph(dna, n):
edges = []
for ke1, val1 in dna:
for ke2, val2 in dna:
if ke1 != ke2 and val1[-n:] == val2[:n]:
edges.append(ke1 + '\t' + ke2)
return edges dna = OrderedDict()
with open('12.txt') as f:
for line in f:
line = line.rstrip()
if line.startswith('>'):
seqName = re.sub('>', '', line)
dna[seqName] = ''
continue
dna[seqName] += line.upper() fh = open('rosalind_grph_output.txt', 'wt')
for x in overlap_graph(dna.items(), 3):
fh.write(x + '\n') fh.close()
方法三
# coding=utf-8
seq_list = []
stseq = ''
for line in open('12.txt'):
if line[0] == '>':
if stseq != '':
seq_list.append([stname, stseq])
stseq = ''
stname = line[1:-1]
else:
stseq = stseq + line.strip('\n')
seq_list.append([stname, stseq])
l = len(seq_list) for i in range(0, l):
for j in range(0, i):
if seq_list[i][1] == seq_list[j][1]:
continue
if seq_list[i][1][0:3] == seq_list[j][1][-3:]:
print seq_list[j][0], seq_list[i][0]
if seq_list[i][1][-3:] == seq_list[j][1][0:3]:
print seq_list[i][0], seq_list[j][0]
12 Overlap Graphs的更多相关文章
- Mathematics for Computer Science (Eric Lehman / F Thomson Leighton / Albert R Meyer 著)
I Proofs1 What is a Proof?2 The Well Ordering Principle3 Logical Formulas4 Mathematical Data Types5 ...
- guava之cache
转自:http://ifeve.com/google-guava-cachesexplained/ 范例 01 LoadingCache<Key, Graph> graphs = Cach ...
- [Google Guava] 3-缓存
原文地址 译文地址 译者:许巧辉 校对:沈义扬 范例 01 LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder() ...
- 迄今为止最硬核的「Java8时间系统」设计原理与使用方法
为了使本篇文章更容易让读者读懂,我特意写了上一篇<任何人都需要知道的「世界时间系统」构成原理,尤其开发人员>的科普文章.本文才是重点,绝对要读,走起! Java平台时间系统的设计方案 几乎 ...
- python 各模块
01 关于本书 02 代码约定 03 关于例子 04 如何联系我们 1 核心模块 11 介绍 111 内建函数和异常 112 操作系统接口模块 113 类型支持模块 114 正则表达式 115 语言支 ...
- Python Standard Library
Python Standard Library "We'd like to pretend that 'Fredrik' is a role, but even hundreds of vo ...
- 在mybatis中写sql语句的一些体会
本文会使用一个案例,就mybatis的一些基础语法进行讲解.案例中使用到的数据库表和对象如下: article表:这个表存放的是文章的基础信息 -- ------------------------- ...
- 剖析虚幻渲染体系(12)- 移动端专题Part 2(GPU架构和机制)
目录 12.4 移动渲染技术要点 12.4.1 Tile-based (Deferred) Rendering 12.4.2 Hierarchical Tiling 12.4.3 Early-Z 12 ...
- The Daligner Overlap Library
/************************************************************************************\ * * * Copyrig ...
随机推荐
- 【传输协议】发送https请求,由于客户端jdk版本过高,服务端版本低。导致异常:javax.net.ssl.SSLHandshakeException: Server chose SSLv3, but that protocol version is not enabled or not supported by the client.
本地环境jdk为1.8,服务器使用jdk版本未知.但发送https请求,抛出如下异常,解决方案. 一:发送异常内容如下 javax.net.ssl.SSLHandshakeException: Ser ...
- 【java基础】java字符串之StringBuffer和StringBuilder
[一]简述区别 package com.sxf.test.string; public class StringBufferStringBuilderTest { public static void ...
- 洛谷4294 [WC2008]游览计划——斯坦纳树
题目:https://www.luogu.org/problemnew/show/P4294 大概是状压.两种转移,一个是以同一个点为中心,S由自己的子集拼起来:一个是S相同.中心不同的同层转移. 注 ...
- web常用测试点记录
输入框 1.字符型输入框: 单行文本输入框:英文全角.英文半角.数字.空或者空格.特殊字符“~!@#¥%……&*?[]{}”,特别要注意单引号和&符号.如果禁止直接输入特殊字符时,使用 ...
- python学习日志
马上就中秋节,想着再学点新的知识,本来想去继续研究前端知识来着,但是内个烦人的样式css还有js搞的有点脑壳头,以后就主学后端吧,要去死了前端这条心了? 那么寻寻觅觅就入坑最近几年大热的python吧 ...
- bzoj4891: [Tjoi2017]龙舟
求$\frac{b_1b_2b_3...b_m}{a_1a_2a_3...a_m}\%M$ M<=1e18,m<=100000,数据组数<=50 用pollard-rho分解M的质因 ...
- ThinkJava-File类
1.1目录列表器: package com.java.io; import java.io.File; import java.io.FilenameFilter; import java.util. ...
- Solr Facet 统计查询
一)概述 Facet是solr的高级搜索功能之一,可以给用户提供更友好的搜索体验.在搜索关键字的同时,能够按照Facet的字段进行分组并统计.例如下图所示,你上淘宝,输入“电脑”进行搜索,就会出现品牌 ...
- Python处理文本换行符
源文件每行后面都有回车,所以用下面输出时,中间会多了一行 try: with open("F:\\hjt.txt" ) as f : for line in f: print(li ...
- UIview需要知道的一些事情:setNeedsDisplay、setNeedsLayout
UIview需要知道的一些事情:setNeedsDisplay.setNeedsLayout 1.在Mac OS中NSWindow的父类是NSResponder,而在i OS 中UIWindow 的父 ...