Problem

To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id

Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta

For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])

  

16 Finding a Protein Motif的更多相关文章

  1. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  2. 09 Finding a Motif in DNA

    Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...

  3. ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe

    Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...

  4. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  5. P6 EPPM Installation and Configuration Guide 16 R1 April 2016

    P6 EPPM Installation and Configuration Guide 16 R1         April 2016 Contents About Installing and ...

  6. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

  7. LOJ Finding LCM(math)

    1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...

  8. 越狱Season 1- Episode 16

    Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...

  9. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

随机推荐

  1. linux下不同tomcat使用不同的jdk版本

    环境:Linux tomcat7 使用jdk7 tomcat 8使用jdk8 最近写好了一个程序,需要部署在测试机器上,这个测试机器上已经有了在运行的tomcat7, 我这个程序要求使用tomcat8 ...

  2. 使用IAR编译STM8S 怎样生产烧录文件

      IAR编译后能够生成的烧录文件格式有4中,例如以下 第一种是Motorola,其生成文件和STVD生成烧录文件.s19格式一样的,即能够通用 另外一种是16进制,keil等等常都用到的. 第三种是 ...

  3. cocos2dx 安卓真机调试问题汇总

    cocos compile编译apk问题汇总: 1,dx编译报错,没有足够的空间 ANTBUILD : [dx] error : Could not create the Java Virtual M ...

  4. TASKER 定制你的手机让它在办公室时屏幕 30 分钟才灭

    TASKER 定制你的手机让它在办公室时屏幕 30 分钟才灭 因为到的办公室,手机一直是充电的,不想屏幕太快关关掉,所以使用 TASKER 做了一个条件. 当 WIFI 连接到公司 WIFI 且充电中 ...

  5. 推荐一个React 入门的教程

    推荐一个React 入门的教程 react 入门实例教程 Github地址:https://github.com/ruanyf/react-demos

  6. ETL流程概述及常用实现方法

    ETL流程概述及常用实现方法 http://blog.csdn.net/btkuangxp/article/details/48224187 目录(?)[-] 1抽取作业 1手工开发抽取作业时候的常用 ...

  7. (转)C#与Outlook交互收发邮件

    本文转载自:http://www.cnblogs.com/Moosdau/archive/2012/03/11/2390729.html .Net对POP3邮件系统已经集成了相应的功能,但是如果是基于 ...

  8. Java 学习思路

    内容中包含 base64string 图片造成字符过多,拒绝显示

  9. 10-28质量监控ELK

    监控业务范围 app崩溃监控(Bugly) 应用性能监控(APM) 业务监控(TalkingData.友盟) 质量监控(缺位) 质量监控平台ELK elk官网 数据构造 线上错误状态分布 故障影响范围 ...

  10. C++中如何强制inline函数(MSVC, GCC)

    #ifdef _MSC_VER_ // for MSVC #define forceinline __forceinline #elif defined __GNUC__ // for gcc on ...