Downloading and installing the SRA Toolkit

step1: 下载并安装SRAtoolkit    (Download the Toolkit from the SRA website)

  1. If you are using a web browser, the following page contains download links to the most current version of the toolkit for each of the supported platforms: SRA Toolkit download page: https://www.ncbi.nlm.nih.gov/Traces/sra/?view=software
  2. If you are instead working from a command line interface, you may use FTP or wget to obtain the software from the following directory: "ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current". Example:
    wget "ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit.current-centos_linux64.tar.gz"

step2 :解压SRA toolkit   (Unpack the Toolkit:)

  1. For Linux, use tar:

    tar -xzf sratoolkit.current-centos_linux64.tar.gz
  2. For Mac OS X, double-click on the .tar.gz file and the Archive Utility will unpack it. Alternatively, command-line tar will also work (see Linux example, above).
  3. For Windows, either use an archiving and compression utility (e.g., Winzip, 7-Zip, etc.), or simply double-click on the .zip file and drag the 'sratoolkit...' folder to the preferred install location.

注解压后:

需要进入  bin路径下

Note: For most users, the Toolkit functions (fastq-dump, sam-dump, etc.) will not be located in their PATH environmental variable. This may require providing directory information about the location of the Toolkit. See the below examples for how 'fastq-dump' would be called in different circumstances:

  • ~/[user_name]/sra-toolkit/fastq-dump

    YES: The Toolkit "bin" directory has been placed in the user-specified directory "sra-toolkit"

  • ./fastq-dump

    YES: The Toolkit components are the in the current working directory

  • fastq-dump

    NO: If the toolkit location is not specified in your $PATH variable, then the OS cannot locate the fastq-dump program, even if it is in the current directory. NOTE: Windows users should be able to enter only "fastq-dump.exe" if you have navigated to the Toolkit "bin" directory.

Testing the Toolkit configuration

The Toolkit comes with a default configuration that will work for most users. You may elect to perform the following tests to confirm that your configuration is working correctly. The default location for the "download repository" is:

  • Linux: /home/[user_name]/ncbi/public
  • Mac OS X: /Users/[user_name]/ncbi/public
  • Windows: C:\Users\[user_name]\ncbi\public

Note that if the tests fail, or if you wish to specify the download location for files sourced from NCBI, you should configure your Toolkit installation. During normal operation, the Toolkit may be required to download the following types of data to the default location:

  • Reference sequences: Small (most less than 70 MB) sequences used to decompress aligned SRA data.
  • SRA data files: If data are downloaded "on-demand" using the toolkit, then partial and whole SRA datasets (most are several Gb in size) can be located here. Note: Manually downloaded SRA data obtained using a web browser, wget, ascp, or FTP may be stored anywhere in the local file system.

For the test, we are using an arbitrary dataset, SRR390728 (RNA-Seq (polyA+) analysis of DLBCL cell line HS0798), from the National Cancer Institute’s Cancer Genome Characterization Initiative (CGCI) Project. It is a reasonably small SRA dataset that contains aligned (reference-compressed) data, allowing us to test multiple aspects of the toolkit simultaneously.

  1. Open a terminal or command prompt and "cd" into the directory containing the toolkit executables (e.g., [download_location]/sratoolkit[version]/bin/).

    • Linux and OS X users should execute the following command:

      ./fastq-dump -X 5 -Z SRR390728
    • Windows users should execute the following command:
      fastq-dump.exe -X 5 -Z SRR390728
  2. If successful, the test should connect to NCBI, download a small amount of data from SRR390728 and the reference sequence needed to extract the data, and stream the first 5 spots of the file ("-X 5" option) to the screen ("-Z" option).
  3. If the configuration is not valid, an error like the following will likely be displayed:
    fastq-dump.2.x err: item not found while constructing within virtual database module - the path 'SRR390728' cannot be opened as database or table"
  4. If you receive an error like the one above, please configure the toolkit (described in the next section). If you have already configured the toolkit but are still unable to complete the test successfully, please email sra-tools@ncbi.nlm.nih.gov with a full description of steps taken and error messages received.

SRA数据转成fastq的更多相关文章

  1. NCBI SRA数据预处理

    SRA数据的的处理流程大概如下 一.SRA数据下载. NCBI 上存储的数据现在大都存储为SRA格式. 下载以后就是以SRA为后缀名. 这里可以通过三种方式下载SRA格式的数据. 1.通过http方式 ...

  2. 用R包来下载sra数据

    1)介绍 我们用SRAdb library来对SRA数据进行处理. SRAdb 可以更方便更快的接入  metadata associated with submission, 包括study, sa ...

  3. xml格式的数据转化成数组

    将得到的xml格式的数据转化成数组 <?php //构造xml $url = "http://api.map.baidu.com/telematics/v3/weather?locat ...

  4. 将数据转化成字符串时:用字符串的链接 还是 StringBuilder

    /* 目的:将数据转化成字符串时:用字符串的链接 还是 StringBuilder呢? */ public class Test{ public static void main(String[] a ...

  5. jQuery操作列表数据转成Json再输出为html dom树

    jQuery 把列表数据转成Json再输出为如下 dom树 <div id="menu" class="lv1"> <ul class=&qu ...

  6. SpringMVC中出现" 400 Bad Request "错误(用@ResponseBody处理ajax传过来的json数据转成bean)的解决方法

    最近angularjs post到后台 400一头雾水 没有任何错误. 最后发现好文,感谢作者 SpringMVC中出现" 400 Bad Request "错误(用@Respon ...

  7. Oracle一列的多行数据拼成一行显示字符

    Oracle一列的多行数据拼成一行显示字符   oracle 提供了两个函数WMSYS.WM_CONCAT 和 ListAgg函数.    www.2cto.com   先介绍:WMSYS.WM_CO ...

  8. 使用Notepad++将多行数据合并成一行

    1.按Ctrl+F,弹出“替换”的窗口: 2.选择“替换”菜单: 3.“查找目标”内容输入为:\r\n: 4.“替换为”内容为空: 5.“查找模式”选择为正则表达式: 6.设置好之后,点击“全部替换” ...

  9. 使用gfortran将数据写成Grads格式的代码示例

    使用gfortran将数据写成Grads格式的代码示例: !-----'Fortran4Grads.f90' program Fortran4Grads implicit none integer,p ...

随机推荐

  1. LeetCode OJ:Subsets II(子集II)

    Given a collection of integers that might contain duplicates, nums, return all possible subsets. Not ...

  2. L120 单词造句

    The old lady sits on a mobile chair every morning.The book contains scandalous text. The current sur ...

  3. poj3268 Silver Cow Party (SPFA求最短路)

    其实还是从一个x点出发到所有点的最短路问题.来和回只需分别处理一下逆图和原图,两次SPFA就行了. #include<iostream> #include<cstdio> #i ...

  4. BZOJ - 2244 拦截导弹 (dp,CDQ分治+树状数组优化)

    题目链接 dp进阶之CDQ分治优化dp. 前置技能:dp基本功底,CDQ分治,树状数组. 问题等价于求二维最长上升子序列,是一个三维偏序问题(时间也算一维). 设$dp[i]=(l,x)$为以第i枚导 ...

  5. 还是畅通工程(peime算法最小生成树)

    个人心得:就是最小生成树的运用,还是要理解好每次都是从已搭建好的生成树里面选择与她的补集中最短距离,所以那个book数组的更新 需要好生体会.不过还是有缺陷,算法的复杂度为O(n^2),看介绍说用优先 ...

  6. JavaScript6 新语法 let 有什么优势

    最近看国外的前端代码时,发现ES6的新特性已经相当普及,尤其是 let,应用非常普遍 虽然 let 的用法与 var 相同,但不管是语法语义上,还是性能上,都提升了很多,下面就从这两方面对比一下 语法 ...

  7. Unity3D的SystemInfo类,用于获取运行设备硬件信息(CPU、显卡、类型等)

    SystemInfo类中的静态变量:   中文显示: Rendering.CopyTextureSupport copyTextureSupport:(只读)支持多种复制纹理功能的情况. string ...

  8. Dubbo模块介绍

    一.Dubbo 整体框架 Dubbo主要有:Config 配置层.Proxy服务代理层.Registry注册中心层.Cluster 路由层.Monitor监控层.Protocol远程调用层.Excha ...

  9. 9.Selenium+HTMLTestRunner无法生成测试报告(Pycharm)

    1.若编辑器为Pycharm,代码无任何错误,且运行成功,但是无法生成测试报告,原因如下:(本质是编辑器原因) 若光标在如图位置,右键显示为“run 'unittest in XX'”,若是显示这种, ...

  10. 5、Selenium+Python自动登录163邮箱发送邮件

    1.Selenium实现自动化,需要定位元素,以下查看163邮箱的登录元素 (1)登录(定位到登录框,登录框是一个iframe,如果没有定位到iframe,是无法定位到账号框与密码框) 定位到邮箱框( ...