How To Use Coordinates To Extract Sequences In Fasta File
[1] bedtools (https://github.com/arq5x/bedtools2)
here is also bedtools (https://github.com/arq5x/bedtools2) getfasta. It uses Erik's code under the hood.
$ cat test.fa
>chr1
AAAAAAAACCCCCCCCCCCCCGCTACTGGGGGGGGGGGGGGGGGG
$ cat test.bed
chr1 5 10
$ bedtools getfasta -fi test.fa -bed test.bed -fo test.fa.out
$ cat test.fa.out
>chr1:5-10
AAACC
Docs: http://bedtools.readthedocs.org/en/latest/content/tools/getfasta.html
And it is wrapped in pybedtools as well: http://pythonhosted.org/pybedtools/autodocs/pybedtools.BedTool.sequence.html?highlight=fasta
https://code.google.com/p/bedtools/
[2] Samtools faidx feature
faidx samtools faidx <ref.fasta> [region1 [...]] Index reference sequence in the FASTA format or extract subsequence from indexed reference sequence. If no region is specified, faidx will index the file and create <ref.fasta>.fai on the disk. If regions are speficified, the subsequences will be retrieved and printed to stdout in the FASTA format.
You will have to first create the fasta indexes of the reference genome fasta file and then use this command.
[3] python implementation of faidx to GitHub.
https://github.com/mdshw5/pyfaidx
[4] UCSC twoBitToFa
use ucsc twoBitToFa in http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/
[5] UCSC DAS
python script to fetch sequences from UCSC DAS server:
http://genome.ucsc.edu/cgi-bin/das/h...r4:35654,35695
[6] ensembl biomart
Ref:
https://www.biostars.org/p/81087/
http://stackoverflow.com/questions/23089388/a-fast-way-to-get-human-genome-sequence-by-coordinate
http://seqanswers.com/forums/showthread.php?t=42463
How To Use Coordinates To Extract Sequences In Fasta File的更多相关文章
- PHP7函数大全(4553个函数)
转载来自: http://www.infocool.net/kb/PHP/201607/168683.html a 函数 说明 abs 绝对值 acos 反余弦 acosh 反双曲余弦 addcsla ...
- Linux command line exercises for NGS data processing
by Umer Zeeshan Ijaz The purpose of this tutorial is to introduce students to the frequently used to ...
- maker 2008年发表在genome Res
http://gmod.org/wiki/MAKER_Tutorial 简单好用 identify repeats, to align ESTs and proteins to the genome, ...
- bedtools 每天都会用到的工具
详细的使用说明:http://bedtools.readthedocs.org/en/latest/ Collectively, the bedtools utilities are a swiss- ...
- cetos6配置用msmtp和mutt发邮件(阿里云)
Linux下可以直接用mail命令发送邮件,但是发件人是user@servername,如果机器没有外网的dns,其他人就无法回复.此时,有一个可以使用网络免费邮箱服务的邮件发送程序就比较重要了.ms ...
- mysql之mysqldump、mysqlimport
一.引言 前一段在做一个csv的导入工具,最麻烦的部分就是对csv文件的解析,最后,老大提醒说是不是考虑的过于麻烦了,由于当时考虑到mysql是允许指定导出的csv文件的格式的,所以考虑到想要兼容这种 ...
- STAR manual
来源:STARmanual.pdf 来源:Calling variants in RNAseq PART0 准备工作 #STAR 安装前的依赖的工具 #Red Hat, CentOS, Fedora. ...
- 32、Differential Gene Expression using RNA-Seq (Workflow)
转载: https://github.com/twbattaglia/RNAseq-workflow Introduction RNAseq is becoming the one of the mo ...
- deb包的安装及dpkg命令小结
DPKG commands There are two actions, they are dpkg-query and dpkg-deb. Install a package # sudo dpkg ...
随机推荐
- CLR via 笔记4.2 类型转换 is 与 as 区别
is 和 as 操作符是用来进行强制类型转换的 is : 检查一个对象是否兼容于其他指定的类型,并返回一个Bool值,永远不会抛出异常 object o = new object(); if (o i ...
- 通过Nginx反向代理实现IP分流
通过Nginx做反向代理来实现分流,以减轻服务器的负载和压力是比较常见的一种服务器部署架构.本文将分享一个如何根据来路IP来进行分流的方法. 根据特定IP来实现分流 将IP地址的最后一段最后一位为0或 ...
- 监控之snmpd 服务
监控离不开数据采集,经常使用的Mrtg ,Cacti,Zabbix,等等监控软件都是通过snmp 协议进行数据采集的! 1 什么是snmp 协议? 简单网络管理协议(SNMP,Simple Netwo ...
- 前端开发 - HTML
1.index2.head标签相关内容3.常用标签一4.常用标签二 table5.常用标签二 form6.标签分类 1.index <!--声明文档的类型 标记该文档为HTML5的文件--> ...
- 【react redux && flux】
redux: http://www.ruanyifeng.com/blog/2016/09/redux_tutorial_part_three_react-redux.html https://bai ...
- Scala函数特性
通常情况下,函数的參数是传值參数:即參数的值在它被传递给函数之前被确定.可是,假设我们须要编写一个接收參数不希望立即计算.直到调用函数内的表达式才进行真正的计算的函数. 对于这样的情况.Scala提供 ...
- python的scikit-learn的主要模块和基本使用
在从事数据科学的人中,最常用的工具就是R和Python了,每个工具都有其利弊,但是Python在各方面都相对胜出一些,这是因为scikit-learn库实现了很多机器学习算法. 加载数据(Data L ...
- spring MVC学习(一)---前端控制器
1.spring MVC中的前段控制器就是DsipatcherServlet,它在spring MVC框架中的结构图如下: 2.DispatcherServlet其实就是一个Servlet,它继承了H ...
- java 多线程 day08 java5多线程新特性
/** * Created by chengtao on 17/12/3. */public class Thread0801_java5_Atomaic { /* 三个包: http://tool. ...
- Spark2.0机器学习系列之11: 聚类(幂迭代聚类, power iteration clustering, PIC)
在Spark2.0版本中(不是基于RDD API的MLlib),共有四种聚类方法: (1)K-means (2)Latent Dirichlet all ...