Problem

To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id

Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta

For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])

  

16 Finding a Protein Motif的更多相关文章

  1. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  2. 09 Finding a Motif in DNA

    Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...

  3. ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe

    Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...

  4. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  5. P6 EPPM Installation and Configuration Guide 16 R1 April 2016

    P6 EPPM Installation and Configuration Guide 16 R1         April 2016 Contents About Installing and ...

  6. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

  7. LOJ Finding LCM(math)

    1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...

  8. 越狱Season 1- Episode 16

    Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...

  9. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

随机推荐

  1. postgres 使用存储过程批量插入数据

    參考资料(pl/pgsql 官方文档): http://www.postgresql.org/docs/9.3/static/plpgsql.html create or replace functi ...

  2. Cocos2d-x调用Java 代码

    Java代码: package com.dishu; import com.dishu.org.R; import android.app.Activity; import android.app.A ...

  3. macdown在mac OS 中的配置

    macdown 用命令行打开.md文件 执行两条命令即可. sudo echo "open -a MacDown \$*" > /usr/local/bin/macdown ...

  4. 为eclipse安装python、shell开发环境和SVN插件

    http://www.crazyant.net/1185.html 为eclipse安装python.shell开发环境和SVN插件 2013/08/27 by Crazyant 暂无评论 eclip ...

  5. Erlang process structure -- refc binary

    Erlang 的process 是虚拟机层面的进程,每个Erlang process 都包括一个 pcb(process control block), 一个stack 以及私有heap . 这部分的 ...

  6. 1021 docker prometheus监控体系

    jmeter plugin监控的信息很少,只有cpu.内存.网络IO,但这些是不够的.例如对于分析mysql数据库的慢查询.最大连接数等更加细密度的信息. 服务端稳定测试的三个前提: 1.应用级别的自 ...

  7. JS截取字符串常用方法详细整理&&MYSQL

    截取字符串的使用比较广泛,有很多中方法,本文粗略的整理了一些,感兴趣的额朋友可以才参考下 使用 substring()或者slice() 函数:split() 功能:使用一个指定的分隔符把一个字符串分 ...

  8. 基于DB的编程

    现在我们大多数的开发都是基于数据库,虽然数据库为我们提供了事务机制,保证存储的数据的ACID,但是,当我们在完成一个业务操作时,涉及到对数据库的大量操作,如果把这些操作在一个事务中,肯定是安全的,但是 ...

  9. uva-10815-字符串排序

    又偷懒了,字符串排序,贱贱的用了std:map #include <iostream> #include <sstream> #include<algorithm> ...

  10. js中的event

    event代表事件的状态,例如触发event对象的元素.鼠标的位置及状态.按下的键等等.event对象只在事件发生的过程中才有效.event的某些属性只对特定的事件有意义.比如,fromElement ...