16 Finding a Protein Motif
Problem
To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.
You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into
http://www.uniprot.org/uniprot/uniprot_id
Alternatively, you can obtain a protein sequence in FASTA format by following
http://www.uniprot.org/uniprot/uniprot_id.fasta
For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.
Given: At most 15 UniProt Protein Database access IDs.
Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.
Sample Dataset
A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST
Sample Output
B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])
16 Finding a Protein Motif的更多相关文章
- 14 Finding a Shared Motif
Problem A common substring of a collection of strings is a substring of every member of the collecti ...
- 09 Finding a Motif in DNA
Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...
- ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe
Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...
- DNA motif 搜索算法总结
DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...
- P6 EPPM Installation and Configuration Guide 16 R1 April 2016
P6 EPPM Installation and Configuration Guide 16 R1 April 2016 Contents About Installing and ...
- P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1
P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1 May ...
- LOJ Finding LCM(math)
1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...
- 越狱Season 1- Episode 16
Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...
- 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)
Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...
随机推荐
- Android Studio中 ADB WIFI插件进行无线调试实践
首先要确保电脑和手机在同一个局域网中.具体步骤如下 1.Android Studio中安装ADB WIFI插件.安装成功后重启Android Studio.(没有安装过插件的同仁,请自己搜索) 2.手 ...
- const 补充
char const* ptr1const char * ptr2char * const ptr3 看到这三个const作何感想 其实const比较好理解的是const 后面整体是不能改变的(整体的 ...
- idea_pyspark 环境配置
本文转载自:https://www.cnblogs.com/LazyJoJo/p/6910504.html 1.配置好Hadoop和spark 2.配置好Pytho3.5 3.安装py4j pip3 ...
- Java 字符串 String
什么是Java中的字符串 在 Java 中,字符串被作为 String 类型的对象处理. String 类位于 java.lang 包中.默认情况下,该包被自动导入所有的程序. 创建 String 对 ...
- 深入理解 Express.js
本文针对那些对Node.js有一定了解的读者.假设你已经知道如何运行Node代码,使用npm安装依赖模块.但我保证,你并不需要是这方面的专家.本文针对的是Express 3.2.5版本,以介绍相关概念 ...
- Keras函数式 API
用Keras定义网络模型有两种方式, Sequential 顺序模型 Keras 函数式 API模型 之前我们介绍了Sequential顺序模型,今天我们来接触一下 Keras 的函数式API模型. ...
- Mongodb 集群加keyFile认证
介绍 自从远古计绳结开始,数据库的存储就注定了今天的地位和多样性,Nosql的出现更是解决了现有的关系型数据库无法解决的一些难题,对高性能,灵活度,扩展性,海量数据的问题.随之而出现的高速内存索引数据 ...
- C#中StreamWriter与BinaryWriter的区别兼谈编码。
原文:http://www.cnblogs.com/ybwang/archive/2010/06/12/1757409.html 参考: 1. <C#高级编程>第六版 2. 文件流和数据 ...
- MySQL GTID (一)
MySQL GTID 系列之一 一.GTID相关概念 GTID:全局事务标识符,MySQL5.6版本开始在主从复制中推出的重量级特性. 每提交一个事务,当前执行线程都会拿到一个给定复制环境中唯一的GT ...
- Apache Sqoop 结构化、非结构化数据转换工具
简介: Apache Sqoop 是一种用于 Apache Hadoop 与关系型数据库之间结构化.非结构化数据转换的工具. 一.安装 MySQL.导入测试数据 1.文档链接:http://www.c ...