bam文件测序深度统计-bamdst

　　最近接触的数据都是靶向测序，或者全外测序的数据。对数据的覆盖深度及靶向捕获效率的评估成为了数据质量监控中必不可少的一环。

　　以前都是用samtools depth 算出单碱基的深度后，用perl来进行深度及捕获效率的计算。今天无意中看到了bamdst（https://github.com/shiquan/bamdst）这个软件，用起来也很方便，参考GitHub，在此记录使用方法。

　　下载并安装：下载安装包并解压后，

cd ./bamdst-master

make

　　安装好后，需要准备.bed文件及.bam文件，以软件提供的MT-RNR1.bed和test.bam为例：

./bamdst-master/bamdst -p ./bamdst-master/example/MT-RNR1.bed -o ./test ./bamdst-master/example/test.bam

　　输出的结果包含7个文件，为：

-coverage.report

-cumu.plot

-insert.plot

-chromosome.report

-region.tsv.gz

-depth.tsv.gz

-uncover.bed

　　其中coverage.report提供的信息很多，具体可参照如下：

 [Total] Raw Reads (All reads) // All reads in the bam file(s).

 [Total] QC Fail reads // Reads number failed QC, this flag is marked by other software,like bwa. See flag in the bam structure.

 [Total] Raw Data(Mb) // Total reads data in the bam file(s).

[Total] Paired Reads // Paired reads numbers.

[Total] Mapped Reads // Mapped reads numbers.

[Total] Fraction of Mapped Reads // Ratio of mapped reads against raw reads.

[Total] Mapped Data(Mb) // Mapped data in the bam file(s).

[Total] Fraction of Mapped Data(Mb) // Ratio of mapped data against raw data.

[Total] Properly paired // Paired reads with properly insert size. See bam format protocol for details.

[Total] Fraction of Properly paired // Ratio of properly paired reads against mapped reads

[Total] Read and mate paired // Read (read1) and mate read (read2) paired.

[Total] Fraction of Read and mate paired // Ratio of read and mate paired against mapped reads

[Total] Singletons // Read mapped but mate read unmapped, and vice versa.

[Total] Read and mate map to diff chr // Read and mate read mapped to different chromosome, usually because mapping error and structure variants.

[Total] Read1 // First reads in mate paired sequencing

[Total] Read2 // Mate reads

[Total] Read1(rmdup) // First reads after remove duplications.

[Total] Read2(rmdup) // Mate reads after remove duplications.

[Total] forward strand reads // Number of forward strand reads.

[Total] backward strand reads // Number of backward strand reads.

[Total] PCR duplicate reads // PCR duplications.

[Total] Fraction of PCR duplicate reads // Ratio of PCR duplications.

[Total] Map quality cutoff value // Cutoff map quality score, this value can be set by -q. default is 20, because some variants caller like GATK only consider high quality reads.

[Total] MapQuality above cutoff reads // Number of reads with higher or equal quality score than cutoff value.

[Total] Fraction of MapQ reads in all reads // Ratio of reads with higher or equal Q score against raw reads.

[Total] Fraction of MapQ reads in mapped reads // Ratio of reads with higher or equal Q score against mapped reads.

[Target] Target Reads // Number of reads covered target region (specified by bed file).

[Target] Fraction of Target Reads in all reads // Ratio of target reads against raw reads.

[Target] Fraction of Target Reads in mapped reads // Ratio of target reads against mapped reads.

[Target] Target Data(Mb) // Total bases covered target region. If a read covered target region partly, only the covered bases will be counted.

[Target] Target Data Rmdup(Mb) // Total bases covered target region after remove PCR duplications.

[Target] Fraction of Target Data in all data // Ratio of target bases against raw bases.

[Target] Fraction of Target Data in mapped data // Ratio of target bases against mapped bases.

[Target] Len of region // The length of target regions.

[Target] Average depth // Average depth of target regions. Calculated by "target bases / length of regions".

[Target] Average depth(rmdup) // Average depth of target regions after remove PCR duplications.

[Target] Coverage (>0x) // Ratio of bases with depth greater than 0x in target regions, which also means the ratio of covered regions in target regions.

[Target] Coverage (>=4x) // Ratio of bases with depth greater than or equal to 4x in target regions.

[Target] Coverage (>=10x) // Ratio of bases with depth greater than or equal to 10x in target regions.

[Target] Coverage (>=30x) // Ratio of bases with depth greater than or equal to 30x in target regions.

[Target] Coverage (>=100x) // Ratio of bases with depth greater than or equal to 100x in target regions.

[Target] Coverage (>=Nx) // This is addtional line for user self-defined cutoff value, see --cutoffdepth

[Target] Target Region Count // Number of target regions. In normal practise,it is the total number of exomes.

[Target] Region covered > 0x // The number of these regions with average depth greater than 0x.

[Target] Fraction Region covered > 0x // Ratio of these regions with average depth greater than 0x.

[Target] Fraction Region covered >= 4x // Ratio of these regions with average depth greater than or equal to 4x.

[Target] Fraction Region covered >= 10x // Ratio of these regions with average depth greater than or equal to 10x.

[Target] Fraction Region covered >= 30x // Ratio of these regions with average depth greater than or equal to 30x.

[Target] Fraction Region covered >= 100x // Ratio of these regions with average depth greater than or equal to 100x.

[flank] flank size // The flank size will be count. 200 bp in default. Oligos could also capture the nearby regions of target regions.

[flank] Len of region (not include target region) // The length of flank regions (target regions will not be count).

[flank] Average depth // Average depth of flank regions.

[flank] flank Reads // The total number of reads covered the flank regions. Note: some reads covered the edge of target regions, will be count in flank regions also.

[flank] Fraction of flank Reads in all reads // Ratio of reads covered in flank regions against raw reads.

[flank] Fraction of flank Reads in mapped reads // Ration of reads covered in flank regions against mapped reads.

[flank] flank Data(Mb) // Total bases in the flank regions.

[flank] Fraction of flank Data in all data // Ratio of total bases in the flank regions against raw data.

[flank] Fraction of flank Data in mapped data // Ratio of total bases in the flank regions against mapped data.

[flank] Coverage (>0x) // Ratio of flank bases with depth greater than 0x.

[flank] Coverage (>=4x) // Ratio of flank bases with depth greater than or equal to 4x.

[flank] Coverage (>=10x) // Ratio of flank bases with depth greater than or equal to 10x.

[flank] Coverage (>=30x) // Ratio of flank bases with depth greater than or equal to 30x.

bam文件测序深度统计-bamdst的更多相关文章

SAMTOOLS使用 SAM BAM文件处理
[怪毛匠子整理] samtools学习及使用范例,以及官方文档详解 #第一步:把sam文件转换成bam文件,我们得到map.bam文件 system"samtools view -bS m ...
Pysam 处理bam文件
Pysam可用来处理bam文件安装: 用 pip 或者 conda即可使用: Pysam的函数有很多,主要的读取函数有: AlignmentFile:读取BAM/CRAM/SAM文件 Varian ...
SAM/BAM文件处理
当测序得到的fastq文件map到基因组之后,我们通常会得到一个sam或者bam为扩展名的文件.SAM的全称是sequence alignment/map format.而BAM就是SAM的二进制文件 ...
bam文件softclip ， hardclip ，markduplicate的探究
测序产生的bam文件,有一些reads在cigar值里显示存在softclip,有一些存在hardclip,究竟softclip和hardclip是怎么判断出来的,还有是怎么标记duplicate ...
C++使用htslib库读入和写出bam文件
有时候我们需要使用C++处理bam文件,比如取出read1或者read2等符合特定条件的序列,根据cigar值对序列指定位置的碱基进行统计或者对序列进行处理并输出等,这时我们可以使用htslib库 ...
文件格式——Sam&bam文件
Sam&bam文件 SAM是一种序列比对格式标准, 由sanger制定,是以TAB为分割符的文本格式.主要应用于测序序列mapping到基因组上的结果表示,当然也可以表示任意的多重比对结果.当 ...
测序深度和覆盖度（Sequencing depth and coverage）
总是跑数据,却对数据一无所知,这说不过去吧. 看几篇文章吧 Sequencing depth and coverage: key considerations in genomic analyses( ...
键盘录入一个文件夹路径,统计该文件夹(包含子文件夹)中每种类型的文件及个数,注意:用文件类型(后缀名,不包含.(点),如："java","txt")作为key, 用个数作为value,放入到map集合中,遍历map集合
package cn.it.zuoye5; import java.io.File;import java.util.HashMap;import java.util.Iterator;import ...
java基础 File与递归练习使用文件过滤器筛选将指定文件夹下的小于200K的小文件获取并打印按层次打印(包括所有子文件夹的文件) 多层文件夹情况统计文件和文件夹的数量统计已知类型的数量未知类型的数量
package com.swift.kuozhan; import java.io.File; import java.io.FileFilter; /*使用文件过滤器筛选将指定文件夹下的小于200K ...

随机推荐

css实现单行的靠左靠右和居中效果
1.父元素 text-align:center 2.子元素 .left{ float:left; } .right{ float:right; } .center{ display:inline ...
maven的.m2文件夹
安装完maven是没有.m2文件夹的. Maven缺省的本地仓库路径为${user.home}/.m2/repository 在linux中以.开头的文件夹都是隐藏的.当使用maven命令的时候,ma ...
了解Mysql与MariaDb的关系
MariaDB是MySQL源代码的一个分支,随着Oracle买下Sun,MySQL也落入了关系型数据库王者之手.在意识到Oracle会对MySQL许可做什么后便分离了出来(MySQL先后被Sun.Or ...
onInterceptTouchEvent和onTouchEvent调用关系详解 ...
http://blog.csdn.net/lvxiangan/article/details/9309927 老实说,这两个小东东实在是太麻烦了,很不好懂,我自己那api文档都头晕,在网上找到很多资料 ...
EJB 配置多个数据源
1.修改jboss-6.simple\server\default\deploy\transaction-jboss-beans.xml 配置文件 <bean name="CoreEn ...
javascript之Window 对象
一.说明:他是JS中最大的对象,它描述的是一个浏览器窗口,一般要引用他的属性和方法时,不需要用“Window.XXX”这种形式,而是直接使用“XXX”.一个框架页面也是一个窗口. 二.Window窗口 ...
【题解】POJ 2115 C Looooops （Exgcd）
POJ 2115:http://poj.org/problem?id=2115 思路设循环T次则要满足A≡(B+CT)(mod 2k) 可得 A=B+CT+m*2k 移项得C*T+2k*m=B-A ...
java的引用总结
四种引用:强弱软虚强引用:使用强引用,在内存不足的时候垃圾处理器也不会回收他,哪怕导致程序崩溃例如: A a=new A() 软引用:内存不足的时候会被回收(软引用可以和一个引用队列(Refere ...
Java并发编程：线程封闭和ThreadLocal详解
转载请标明出处: http://blog.csdn.net/forezp/article/details/77620769 本文出自方志朋的博客什么是线程封闭当访问共享变量时,往往需要加锁来保证数 ...
PL/SQL dev 工具连接远程服务器oracle注意点
由于Oracle的庞大,有时候我们需要在只安装Oracle客户端如plsql.toad等的情况下去连接远程数据库,可是没有安装Oracle就没有一切的配置文件去支持. 最后终于发现一个很有效的方法,O ...

bam文件测序深度统计-bamdst

bam文件测序深度统计-bamdst的更多相关文章

随机推荐

热门专题