最近接触的数据都是靶向测序,或者全外测序的数据。对数据的覆盖深度及靶向捕获效率的评估成为了数据质量监控中必不可少的一环。

  以前都是用samtools depth 算出单碱基的深度后,用perl来进行深度及捕获效率的计算。今天无意中看到了bamdst(https://github.com/shiquan/bamdst)这个软件,用起来也很方便,参考GitHub,在此记录使用方法。

  下载并安装:下载安装包并解压后,

cd ./bamdst-master
make

  安装好后,需要准备.bed文件及.bam文件,以软件提供的MT-RNR1.bed和test.bam为例:

./bamdst-master/bamdst -p ./bamdst-master/example/MT-RNR1.bed -o ./test ./bamdst-master/example/test.bam

  输出的结果包含7个文件,为:

-coverage.report
-cumu.plot
-insert.plot
-chromosome.report
-region.tsv.gz
-depth.tsv.gz
-uncover.bed

  其中coverage.report提供的信息很多,具体可参照如下:

 [Total] Raw Reads (All reads) // All reads in the bam file(s).
[Total] QC Fail reads // Reads number failed QC, this flag is marked by other software,like bwa. See flag in the bam structure.
[Total] Raw Data(Mb) // Total reads data in the bam file(s).
[Total] Paired Reads // Paired reads numbers.
[Total] Mapped Reads // Mapped reads numbers.
[Total] Fraction of Mapped Reads // Ratio of mapped reads against raw reads.
[Total] Mapped Data(Mb) // Mapped data in the bam file(s).
[Total] Fraction of Mapped Data(Mb) // Ratio of mapped data against raw data.
[Total] Properly paired // Paired reads with properly insert size. See bam format protocol for details.
[Total] Fraction of Properly paired // Ratio of properly paired reads against mapped reads
[Total] Read and mate paired // Read (read1) and mate read (read2) paired.
[Total] Fraction of Read and mate paired // Ratio of read and mate paired against mapped reads
[Total] Singletons // Read mapped but mate read unmapped, and vice versa.
[Total] Read and mate map to diff chr // Read and mate read mapped to different chromosome, usually because mapping error and structure variants.
[Total] Read1 // First reads in mate paired sequencing
[Total] Read2 // Mate reads
[Total] Read1(rmdup) // First reads after remove duplications.
[Total] Read2(rmdup) // Mate reads after remove duplications.
[Total] forward strand reads // Number of forward strand reads.
[Total] backward strand reads // Number of backward strand reads.
[Total] PCR duplicate reads // PCR duplications.
[Total] Fraction of PCR duplicate reads // Ratio of PCR duplications.
[Total] Map quality cutoff value // Cutoff map quality score, this value can be set by -q. default is 20, because some variants caller like GATK only consider high quality reads.
[Total] MapQuality above cutoff reads // Number of reads with higher or equal quality score than cutoff value.
[Total] Fraction of MapQ reads in all reads // Ratio of reads with higher or equal Q score against raw reads.
[Total] Fraction of MapQ reads in mapped reads // Ratio of reads with higher or equal Q score against mapped reads.
[Target] Target Reads // Number of reads covered target region (specified by bed file).
[Target] Fraction of Target Reads in all reads // Ratio of target reads against raw reads.
[Target] Fraction of Target Reads in mapped reads // Ratio of target reads against mapped reads.
[Target] Target Data(Mb) // Total bases covered target region. If a read covered target region partly, only the covered bases will be counted.
[Target] Target Data Rmdup(Mb) // Total bases covered target region after remove PCR duplications.
[Target] Fraction of Target Data in all data // Ratio of target bases against raw bases.
[Target] Fraction of Target Data in mapped data // Ratio of target bases against mapped bases.
[Target] Len of region // The length of target regions.
[Target] Average depth // Average depth of target regions. Calculated by "target bases / length of regions".
[Target] Average depth(rmdup) // Average depth of target regions after remove PCR duplications.
[Target] Coverage (>0x) // Ratio of bases with depth greater than 0x in target regions, which also means the ratio of covered regions in target regions.
[Target] Coverage (>=4x) // Ratio of bases with depth greater than or equal to 4x in target regions.
[Target] Coverage (>=10x) // Ratio of bases with depth greater than or equal to 10x in target regions.
[Target] Coverage (>=30x) // Ratio of bases with depth greater than or equal to 30x in target regions.
[Target] Coverage (>=100x) // Ratio of bases with depth greater than or equal to 100x in target regions.
[Target] Coverage (>=Nx) // This is addtional line for user self-defined cutoff value, see --cutoffdepth
[Target] Target Region Count // Number of target regions. In normal practise,it is the total number of exomes.
[Target] Region covered > 0x // The number of these regions with average depth greater than 0x.
[Target] Fraction Region covered > 0x // Ratio of these regions with average depth greater than 0x.
[Target] Fraction Region covered >= 4x // Ratio of these regions with average depth greater than or equal to 4x.
[Target] Fraction Region covered >= 10x // Ratio of these regions with average depth greater than or equal to 10x.
[Target] Fraction Region covered >= 30x // Ratio of these regions with average depth greater than or equal to 30x.
[Target] Fraction Region covered >= 100x // Ratio of these regions with average depth greater than or equal to 100x.
[flank] flank size // The flank size will be count. 200 bp in default. Oligos could also capture the nearby regions of target regions.
[flank] Len of region (not include target region) // The length of flank regions (target regions will not be count).
[flank] Average depth // Average depth of flank regions.
[flank] flank Reads // The total number of reads covered the flank regions. Note: some reads covered the edge of target regions, will be count in flank regions also.
[flank] Fraction of flank Reads in all reads // Ratio of reads covered in flank regions against raw reads.
[flank] Fraction of flank Reads in mapped reads // Ration of reads covered in flank regions against mapped reads.
[flank] flank Data(Mb) // Total bases in the flank regions.
[flank] Fraction of flank Data in all data // Ratio of total bases in the flank regions against raw data.
[flank] Fraction of flank Data in mapped data // Ratio of total bases in the flank regions against mapped data.
[flank] Coverage (>0x) // Ratio of flank bases with depth greater than 0x.
[flank] Coverage (>=4x) // Ratio of flank bases with depth greater than or equal to 4x.
[flank] Coverage (>=10x) // Ratio of flank bases with depth greater than or equal to 10x.
[flank] Coverage (>=30x) // Ratio of flank bases with depth greater than or equal to 30x.

  

  

bam文件测序深度统计-bamdst的更多相关文章

  1. SAMTOOLS使用 SAM BAM文件处理

    [怪毛匠子 整理] samtools学习及使用范例,以及官方文档详解 #第一步:把sam文件转换成bam文件,我们得到map.bam文件 system"samtools view -bS m ...

  2. Pysam 处理bam文件

    Pysam可用来处理bam文件 安装: 用 pip 或者 conda即可 使用: Pysam的函数有很多,主要的读取函数有: AlignmentFile:读取BAM/CRAM/SAM文件 Varian ...

  3. SAM/BAM文件处理

    当测序得到的fastq文件map到基因组之后,我们通常会得到一个sam或者bam为扩展名的文件.SAM的全称是sequence alignment/map format.而BAM就是SAM的二进制文件 ...

  4. bam文件softclip , hardclip ,markduplicate的探究

      测序产生的bam文件,有一些reads在cigar值里显示存在softclip,有一些存在hardclip,究竟softclip和hardclip是怎么判断出来的,还有是怎么标记duplicate ...

  5. C++使用htslib库读入和写出bam文件

      有时候我们需要使用C++处理bam文件,比如取出read1或者read2等符合特定条件的序列,根据cigar值对序列指定位置的碱基进行统计或者对序列进行处理并输出等,这时我们可以使用htslib库 ...

  6. 文件格式——Sam&bam文件

    Sam&bam文件 SAM是一种序列比对格式标准, 由sanger制定,是以TAB为分割符的文本格式.主要应用于测序序列mapping到基因组上的结果表示,当然也可以表示任意的多重比对结果.当 ...

  7. 测序深度和覆盖度(Sequencing depth and coverage)

    总是跑数据,却对数据一无所知,这说不过去吧. 看几篇文章吧 Sequencing depth and coverage: key considerations in genomic analyses( ...

  8. 键盘录入一个文件夹路径,统计该文件夹(包含子文件夹)中每种类型的文件及个数,注意:用文件类型(后缀名,不包含.(点),如:"java","txt")作为key, 用个数作为value,放入到map集合中,遍历map集合

    package cn.it.zuoye5; import java.io.File;import java.util.HashMap;import java.util.Iterator;import ...

  9. java基础 File与递归练习 使用文件过滤器筛选将指定文件夹下的小于200K的小文件获取并打印按层次打印(包括所有子文件夹的文件) 多层文件夹情况统计文件和文件夹的数量 统计已知类型的数量 未知类型的数量

    package com.swift.kuozhan; import java.io.File; import java.io.FileFilter; /*使用文件过滤器筛选将指定文件夹下的小于200K ...

随机推荐

  1. genil层

    genil 层将底层的业务逻辑封装成一个接口(例如 get_dynamic_result这种),供ui层调用(ui点击 search dynamic result按钮,会调用 get_dynamic_ ...

  2. 关于layui问题

    编辑: $('#Teacher').find('option').each(function(){ $(this).attr('selected',$(this).val()==data.tid); ...

  3. 使用classList来实现两个按钮样式的切换

    classList属性的方法:add();remove();toggle(); 描述,在一些页面我们需要使用两个按钮来回切换,如图: 我们要使用到add()和remove()方法 html部分: &l ...

  4. 重写equals方法(未完)

    equals方法是我们日常编程中很常见的方法,Object中对这个方法的解释如下: boolean equals(Object obj) 指示其他某个对象是否与此对象“相等”. 查看该方法的底层代码如 ...

  5. 微信小程序流量主如何开通

    2018年7月09日,微信小程序流量主全面开通,开通条件如下: 累计独立访客(UV)不低于1000          # 一共一千个人访问你的小程序就可以申请(不限时间) 有严重违规记录的小程序不予申 ...

  6. 嵌入式:UCOSIII的使用(17.01.24补充)

    0.一些移植.系统相关 OS_CFG_APP.H /* --------------------- MISCELLANEOUS ------------------ */ #define OS_CFG ...

  7. Angularjs基础(三)

    AngularJS ng-model 指令 ng-model 指令用于绑定应用程序数据到HTML 控制器(input,select,textarea)的值ng-model指令 ng-model指令可以 ...

  8. JS中常见算法问题

    JS中常见算法问题 1. 阐述JS中的变量提升(声明提前) 答:将所有的变量提升当当前作用域的顶部,赋值留在原地.意味着我们可以在某个变量声明前就使用该变量. 虽然JS会进行变量提升,但并不会执行真正 ...

  9. chromium之tracked

    //------------------------------------------------------------------------------ // Tracked is the b ...

  10. ABAP术语-Function Builder

    Function Builder 原文:http://www.cnblogs.com/qiangsheng/archive/2008/02/03/1063196.html Tool for creat ...