Problem

To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id

Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta

For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])

  

16 Finding a Protein Motif的更多相关文章

  1. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  2. 09 Finding a Motif in DNA

    Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...

  3. ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe

    Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...

  4. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  5. P6 EPPM Installation and Configuration Guide 16 R1 April 2016

    P6 EPPM Installation and Configuration Guide 16 R1         April 2016 Contents About Installing and ...

  6. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

  7. LOJ Finding LCM(math)

    1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...

  8. 越狱Season 1- Episode 16

    Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...

  9. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

随机推荐

  1. 如何查看MySql的BLOB内容

    一款Mysql的工具: SQLyog. 强项在于可以把blob的内容直接显示出来. 我觉得其实做产品能够活挺厉害,因为你做的东西确实为客户提供价值:在云云产品之中,能够让客户发现你并使用,购买你的产品 ...

  2. 【python】smtp邮件发送

    纯文本: #!/usr/bin/env python3 #coding: utf-8 import smtplib from email.mime.text import MIMEText from ...

  3. flexible.js框架改写

    前一阶段拜读了阿里团队的flexible.js,但是flexible的封装感觉还是不完美,因为flexible还是要依赖less/sass之类的编译执行,所以就存了一些问题,我把这些问题进行整理. 优 ...

  4. elasticsearch RESTful

    一 .索引(index) 1. 创建索引 (1)第一种方式 PUT twitter { "settings" : { "index" : { "num ...

  5. Java 将字符串转换为字符数组 toCharArray()

    Java 手册 toCharArray public char[] toCharArray() 将此字符串转换为一个新的字符数组. 返回: 一个新分配的字符数组,它的长度是此字符串的长度,它的内容被初 ...

  6. 汇编_指令_INC

    加1指令 INC指令功能:目标操作数+1 INC指令只有1个操作数,它将指定的操作数的内容加1,再将结果送回到该操作数.INC指令将影响SF,AF,ZF,PF,OF标志位,但是不影响CF标志位. IN ...

  7. 将各种格式的数据转换成XML

    public class DataToXml    {               /// <summary>        /// 将DataTable对象转换成XML字符串       ...

  8. python 构造mysql爆破器

    前言: 今天已经期末考完,睡了个觉起床写了个 mysql爆破器. 思路: 1.爆破用户->用户存在的话不会报错反之报错 2.爆破密码->密码正确不会报错反之报错 3.用户名和密码一起爆破- ...

  9. oracle9i-11.2安装包及补丁包下载链接

    ORACLE 9i Oracle9i Database Release 2 Enterprise/Standard/Personal Edition for Windows NT/2000/XPhtt ...

  10. ES6系列_12之map数据结构

    1.map数据结构出现的原因? JavaScript 的对象(Object),本质上是键值对的集合(Hash 结构),但是传统上只能用字符串当作键.这给它的使用带来了很大的限制.为了能实现将对象作为键 ...