Problem

To allow for the presence of its varying forms, a protein motif is represented by a shorthand as follows: [XY] means "either X or Y" and {X} means "any amino acid except X." For example, the N-glycosylation motif is written as N{P}[ST]{P}.

You can see the complete description and features of a particular protein by its access ID "uniprot_id" in the UniProt database, by inserting the ID number into

http://www.uniprot.org/uniprot/uniprot_id

Alternatively, you can obtain a protein sequence in FASTA format by following

http://www.uniprot.org/uniprot/uniprot_id.fasta

For example, the data for protein B5ZC00 can be found at http://www.uniprot.org/uniprot/B5ZC00.

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

Sample Dataset

A2Z669
B5ZC00
P07204_TRBM_HUMAN
P20840_SAG1_YEAST

Sample Output

B5ZC00
85 118 142 306 395
P07204_TRBM_HUMAN
47 115 116 382 409
P20840_SAG1_YEAST
79 109 135 248 306 348 364 402 485 501 614
#coding=utf-8
import urllib2
import re
list = ['A2Z669','B5ZC00','P07204_TRBM_HUMAN','P20840_SAG1_YEAST'] for one in list:
name = one.strip('\n')
url = 'http://www.uniprot.org/uniprot/'+name+'.fasta'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
start = the_page.find('\nM')
seq = the_page[start+1:].replace('\n','')
seq = ' '+seq
regex = re.compile(r'N(?=[^P][ST][^P])')
index = 0
out = []
'''
out = [m.start() for m in re.finditer(regex, seq)]
''' index = 0
while(index<len(seq)):
index += 1 if re.search(regex,seq[index:]) == None:
break #print S[index:]
if re.match(regex,seq[index:]) != None:
out.append(index) if out != []:
print name
print ' '.join([ str(i) for i in out])

  

16 Finding a Protein Motif的更多相关文章

  1. 14 Finding a Shared Motif

    Problem A common substring of a collection of strings is a substring of every member of the collecti ...

  2. 09 Finding a Motif in DNA

    Problem Given two strings ss and tt, tt is a substring of ss if tt is contained as a contiguous coll ...

  3. ERROR: openstack Error finding address for http://10.16.37.215:9292/v1/images: [Errno 32] Broken pipe

    Try to set: no_proxy=10.16.37.215 this should help 转自: http://askubuntu.com/questions/575938/error-i ...

  4. DNA motif 搜索算法总结

    DNA motif 搜索算法总结 2011-09-15 ~ ADMIN 翻译自:A survey of DNA motif finding algorithms, Modan K Das et. al ...

  5. P6 EPPM Installation and Configuration Guide 16 R1 April 2016

    P6 EPPM Installation and Configuration Guide 16 R1         April 2016 Contents About Installing and ...

  6. P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1

    P6 Professional Installation and Configuration Guide (Microsoft SQL Server Database) 16 R1       May ...

  7. LOJ Finding LCM(math)

    1215 - Finding LCM Time Limit: 2 second(s) Memory Limit: 32 MB LCM is an abbreviation used for Least ...

  8. 越狱Season 1- Episode 16

    Season 1, Episode 16 -Burrows:Don't be. It's not your fault. 不要,不是你的错 -Fernando: Know what I like? 知 ...

  9. 查找EBS中各种文件版本(Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER)

    Finding File Versions in the Oracle Applications EBusiness Suite - Checking the $HEADER (文档 ID 85895 ...

随机推荐

  1. Android Studio中 ADB WIFI插件进行无线调试实践

    首先要确保电脑和手机在同一个局域网中.具体步骤如下 1.Android Studio中安装ADB WIFI插件.安装成功后重启Android Studio.(没有安装过插件的同仁,请自己搜索) 2.手 ...

  2. const 补充

    char const* ptr1const char * ptr2char * const ptr3 看到这三个const作何感想 其实const比较好理解的是const 后面整体是不能改变的(整体的 ...

  3. idea_pyspark 环境配置

    本文转载自:https://www.cnblogs.com/LazyJoJo/p/6910504.html 1.配置好Hadoop和spark 2.配置好Pytho3.5 3.安装py4j pip3 ...

  4. Java 字符串 String

    什么是Java中的字符串 在 Java 中,字符串被作为 String 类型的对象处理. String 类位于 java.lang 包中.默认情况下,该包被自动导入所有的程序. 创建 String 对 ...

  5. 深入理解 Express.js

    本文针对那些对Node.js有一定了解的读者.假设你已经知道如何运行Node代码,使用npm安装依赖模块.但我保证,你并不需要是这方面的专家.本文针对的是Express 3.2.5版本,以介绍相关概念 ...

  6. Keras函数式 API

    用Keras定义网络模型有两种方式, Sequential 顺序模型 Keras 函数式 API模型 之前我们介绍了Sequential顺序模型,今天我们来接触一下 Keras 的函数式API模型. ...

  7. Mongodb 集群加keyFile认证

    介绍 自从远古计绳结开始,数据库的存储就注定了今天的地位和多样性,Nosql的出现更是解决了现有的关系型数据库无法解决的一些难题,对高性能,灵活度,扩展性,海量数据的问题.随之而出现的高速内存索引数据 ...

  8. C#中StreamWriter与BinaryWriter的区别兼谈编码。

    原文:http://www.cnblogs.com/ybwang/archive/2010/06/12/1757409.html 参考: 1. <C#高级编程>第六版 2.  文件流和数据 ...

  9. MySQL GTID (一)

    MySQL GTID 系列之一 一.GTID相关概念 GTID:全局事务标识符,MySQL5.6版本开始在主从复制中推出的重量级特性. 每提交一个事务,当前执行线程都会拿到一个给定复制环境中唯一的GT ...

  10. Apache Sqoop 结构化、非结构化数据转换工具

    简介: Apache Sqoop 是一种用于 Apache Hadoop 与关系型数据库之间结构化.非结构化数据转换的工具. 一.安装 MySQL.导入测试数据 1.文档链接:http://www.c ...