Problem

To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id

Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta

For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])

  

16 Finding a Protein Motif的更多相关文章

  1. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  2. 09 Finding a Motif in DNA

    Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...

  3. ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe

    Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...

  4. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  5. P6 EPPM Installation and Configuration Guide 16 R1 April 2016

    P6 EPPM Installation and Configuration Guide 16 R1         April 2016 Contents About Installing and ...

  6. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

  7. LOJ Finding LCM(math)

    1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...

  8. 越狱Season 1- Episode 16

    Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...

  9. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

随机推荐

  1. [Qt] QLineEdit 仿QQ签名框

    今天鼓捣了半天,终于实现了自定义Qt中的QlineEdit控件的大致效果. 这个问题对于新手而言,主要有以下几个难点: 1.继承QLineEdit控件 2.QSS设置QLineEdit的相关样式,可以 ...

  2. centos下memcached安装

    memcached是一款高速.分布式的内存缓存系统.其官方主页在http://www.danga.com/memcached/ 1.安装前的准备 要安装memcached,需要有libevent的支持 ...

  3. debian下为apache启用rewrite模块

    如果我们是自己编译的apache,那么启用或禁用某个模块应该说是比较容易的事,只要修改apache的配置文件就可以了.但是我们没有理由不用已经做好的二进制文件进行安装,使用apt-get要方便多了. ...

  4. Shell实现Unix进程间信息交换的几种方法

    本文将介绍在SCO OpenServer5.0.5系统中使用shell语言来实现进程间信息交换的几种方法: 使用命名管道实现进程间信息交换 使用kill命令和trap语句实现进程间信息交换 使用点命令 ...

  5. cuDNN下载地址和指南

    我出现了报错 Could not find 'cudnn64_7.dll'. TensorFlow requires that this DLL be installed in a directory ...

  6. 20181123_控制反转(IOC)和依赖注入(DI)

    一.   控制反转和依赖注入: 控制反转的前提, 是依赖倒置原则, 系统架构时,高层模块不应该依赖于低层模块,二者通过抽象来依赖 (依赖抽象,而不是细节) 如果要想做到控制反转(IOC), 就必须要使 ...

  7. storm架构及原理

    storm 架构与原理 1 storm简介 1.1 storm是什么 如果只用一句话来描述 storm 是什么的话:分布式 && 实时 计算系统.按照作者 Nathan Marz 的说 ...

  8. Hadoop 2.7.3 安装配置及测试

    1.概述 Hadoop是一个由Apache基金会所开发的分布式系统基础架构.用户可以在不了解分布式底层细节的情况下,开发分布式程序.hadoop三种安装模式:单机模式,伪分布式,真正分布式.因在实际生 ...

  9. python twilio 短信群发 知识留存

    1. win7 32位系统,傻瓜安装Anaconda2(python 2.7) 2. 打开cmd, 输入命令pip install twilio,在线安装twilio 3. 打开Anaconda2的S ...

  10. Cmder的安装

    Cmder把conemu,git-for-windows和clink打包在一起,让你无需配置就能使用一个真正干净的Linux终端!性感的外观,强大的功能!代替了Windows原生的Cmd 1. 安裝 ...