16 Finding a Protein Motif
Problem
To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.
You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into
http://www.uniprot.org/uniprot/uniprot_id
Alternatively, you can obtain a protein sequence in FASTA format by following
http://www.uniprot.org/uniprot/uniprot_id.fasta
For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.
Given: At most 15 UniProt Protein Database access IDs.
Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.
Sample Dataset
A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST
Sample Output
B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])
16 Finding a Protein Motif的更多相关文章
- 14 Finding a Shared Motif
Problem A common substring of a collection of strings is a substring of every member of the collecti ...
- 09 Finding a Motif in DNA
Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...
- ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe
Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...
- DNA motif 搜索算法总结
DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...
- P6 EPPM Installation and Configuration Guide 16 R1 April 2016
P6 EPPM Installation and Configuration Guide 16 R1 April 2016 Contents About Installing and ...
- P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1
P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1 May ...
- LOJ Finding LCM(math)
1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...
- 越狱Season 1- Episode 16
Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...
- 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)
Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...
随机推荐
- [Qt] QLineEdit 仿QQ签名框
今天鼓捣了半天,终于实现了自定义Qt中的QlineEdit控件的大致效果. 这个问题对于新手而言,主要有以下几个难点: 1.继承QLineEdit控件 2.QSS设置QLineEdit的相关样式,可以 ...
- centos下memcached安装
memcached是一款高速.分布式的内存缓存系统.其官方主页在http://www.danga.com/memcached/ 1.安装前的准备 要安装memcached,需要有libevent的支持 ...
- debian下为apache启用rewrite模块
如果我们是自己编译的apache,那么启用或禁用某个模块应该说是比较容易的事,只要修改apache的配置文件就可以了.但是我们没有理由不用已经做好的二进制文件进行安装,使用apt-get要方便多了. ...
- Shell实现Unix进程间信息交换的几种方法
本文将介绍在SCO OpenServer5.0.5系统中使用shell语言来实现进程间信息交换的几种方法: 使用命名管道实现进程间信息交换 使用kill命令和trap语句实现进程间信息交换 使用点命令 ...
- cuDNN下载地址和指南
我出现了报错 Could not find 'cudnn64_7.dll'. TensorFlow requires that this DLL be installed in a directory ...
- 20181123_控制反转(IOC)和依赖注入(DI)
一. 控制反转和依赖注入: 控制反转的前提, 是依赖倒置原则, 系统架构时,高层模块不应该依赖于低层模块,二者通过抽象来依赖 (依赖抽象,而不是细节) 如果要想做到控制反转(IOC), 就必须要使 ...
- storm架构及原理
storm 架构与原理 1 storm简介 1.1 storm是什么 如果只用一句话来描述 storm 是什么的话:分布式 && 实时 计算系统.按照作者 Nathan Marz 的说 ...
- Hadoop 2.7.3 安装配置及测试
1.概述 Hadoop是一个由Apache基金会所开发的分布式系统基础架构.用户可以在不了解分布式底层细节的情况下,开发分布式程序.hadoop三种安装模式:单机模式,伪分布式,真正分布式.因在实际生 ...
- python twilio 短信群发 知识留存
1. win7 32位系统,傻瓜安装Anaconda2(python 2.7) 2. 打开cmd, 输入命令pip install twilio,在线安装twilio 3. 打开Anaconda2的S ...
- Cmder的安装
Cmder把conemu,git-for-windows和clink打包在一起,让你无需配置就能使用一个真正干净的Linux终端!性感的外观,强大的功能!代替了Windows原生的Cmd 1. 安裝 ...