16 Finding a Protein Motif
Problem
To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.
You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into
http://www.uniprot.org/uniprot/uniprot_id
Alternatively, you can obtain a protein sequence in FASTA format by following
http://www.uniprot.org/uniprot/uniprot_id.fasta
For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.
Given: At most 15 UniProt Protein Database access IDs.
Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.
Sample Dataset
A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST
Sample Output
B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])
16 Finding a Protein Motif的更多相关文章
- 14 Finding a Shared Motif
Problem A common substring of a collection of strings is a substring of every member of the collecti ...
- 09 Finding a Motif in DNA
Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...
- ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe
Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...
- DNA motif 搜索算法总结
DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...
- P6 EPPM Installation and Configuration Guide 16 R1 April 2016
P6 EPPM Installation and Configuration Guide 16 R1 April 2016 Contents About Installing and ...
- P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1
P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1 May ...
- LOJ Finding LCM(math)
1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...
- 越狱Season 1- Episode 16
Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...
- 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)
Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...
随机推荐
- Struts2自定义标签3模仿原有的s:if s:elseif s:else自定义自己的if elsif else
第一步:webroot/web-inf下简历str.tld文件 <?xml version="1.0" encoding="UTF-8"?> < ...
- Linux中的中断处理
1. Linux中中断除了中断分层之外,还有一种就是中断线程化 存在意义:在Linux中,中断具有最高的优先级.不论在任何时刻,只要产生中断事件,内核将立即执行相应的中断处理程序,等到所有挂起的中断和 ...
- MySql登陆密码忘记了 怎么办?
MySql登陆密码忘记了 怎么办?root密码:连root密码忘记没用root进修改mysql数据库user表咯 root密码: 方法一:MySQL提供跳访问控制命令行参数通命令行命令启MySQL服务 ...
- 【c#】设置Socket连接、接收超时(转)
用到Socket,发现如果连接错误,比如Connect的端口不对,会造成很长时间的延时,程序就僵在那里,效果很不好: 在网上找到很方便的设置办法,分享如下: Socket.SetSocketOptio ...
- 【python】 使用 setuptools
不会安装python的egg文件,在网上搜索了一下,被“蟒蛇蛋”这个词雷到了,记录下. 随着对python的逐渐使用,发现一些python组件是用一个包管理器发布的,今天搞了快一个小时,终于搞定了,这 ...
- emacs之配置8,gdb调试设置
emacsConfig/gdb-setting.el (global-set-key [(f5)] 'gud-go) (global-set-key [(f7)] 'gud-step) (global ...
- [转]Jsp 与 JavaBean
JavaBean 是一个遵循特定写法的 Java 类,它有以下特点: 1. Java 类具有一个无参的构造函数 2. 属性必须私有化. 3. 私有化的属性通过 public 类型的方法暴露给其它程序, ...
- ROS使用国内的DDNS服务
未测试.转载余松老师的作品 虽然RouterOS 加入了cloud功能,但最近在配置RB2011的时候发现不好使,更新域名后无法正确解析到我的IP地址,虽然在cloud的public address中 ...
- OpenCL 矢量存取
▶ 函数 vloadn 和 vstoren 来实现全局存储器和局部存储器之间的向量拷贝 ● 代码 #include <stdio.h> #include <stdlib.h> ...
- maven工程 ,通过maven更新后,jre恢复到1.5的解决方法
在maven setting.xml profiles节点下加入 <profile> <id>jdk-1.8</id> <activation> < ...