bam文件测序深度统计-bamdst

　　最近接触的数据都是靶向测序，或者全外测序的数据。对数据的覆盖深度及靶向捕获效率的评估成为了数据质量监控中必不可少的一环。

　　以前都是用samtools depth 算出单碱基的深度后，用perl来进行深度及捕获效率的计算。今天无意中看到了bamdst（https://github.com/shiquan/bamdst）这个软件，用起来也很方便，参考GitHub，在此记录使用方法。

　　下载并安装：下载安装包并解压后，

cd ./bamdst-master

make

　　安装好后，需要准备.bed文件及.bam文件，以软件提供的MT-RNR1.bed和test.bam为例：

./bamdst-master/bamdst -p ./bamdst-master/example/MT-RNR1.bed -o ./test ./bamdst-master/example/test.bam

　　输出的结果包含7个文件，为：

-coverage.report

-cumu.plot

-insert.plot

-chromosome.report

-region.tsv.gz

-depth.tsv.gz

-uncover.bed

　　其中coverage.report提供的信息很多，具体可参照如下：

 [Total] Raw Reads (All reads) // All reads in the bam file(s).

 [Total] QC Fail reads // Reads number failed QC, this flag is marked by other software,like bwa. See flag in the bam structure.

 [Total] Raw Data(Mb) // Total reads data in the bam file(s).

[Total] Paired Reads // Paired reads numbers.

[Total] Mapped Reads // Mapped reads numbers.

[Total] Fraction of Mapped Reads // Ratio of mapped reads against raw reads.

[Total] Mapped Data(Mb) // Mapped data in the bam file(s).

[Total] Fraction of Mapped Data(Mb) // Ratio of mapped data against raw data.

[Total] Properly paired // Paired reads with properly insert size. See bam format protocol for details.

[Total] Fraction of Properly paired // Ratio of properly paired reads against mapped reads

[Total] Read and mate paired // Read (read1) and mate read (read2) paired.

[Total] Fraction of Read and mate paired // Ratio of read and mate paired against mapped reads

[Total] Singletons // Read mapped but mate read unmapped, and vice versa.

[Total] Read and mate map to diff chr // Read and mate read mapped to different chromosome, usually because mapping error and structure variants.

[Total] Read1 // First reads in mate paired sequencing

[Total] Read2 // Mate reads

[Total] Read1(rmdup) // First reads after remove duplications.

[Total] Read2(rmdup) // Mate reads after remove duplications.

[Total] forward strand reads // Number of forward strand reads.

[Total] backward strand reads // Number of backward strand reads.

[Total] PCR duplicate reads // PCR duplications.

[Total] Fraction of PCR duplicate reads // Ratio of PCR duplications.

[Total] Map quality cutoff value // Cutoff map quality score, this value can be set by -q. default is 20, because some variants caller like GATK only consider high quality reads.

[Total] MapQuality above cutoff reads // Number of reads with higher or equal quality score than cutoff value.

[Total] Fraction of MapQ reads in all reads // Ratio of reads with higher or equal Q score against raw reads.

[Total] Fraction of MapQ reads in mapped reads // Ratio of reads with higher or equal Q score against mapped reads.

[Target] Target Reads // Number of reads covered target region (specified by bed file).

[Target] Fraction of Target Reads in all reads // Ratio of target reads against raw reads.

[Target] Fraction of Target Reads in mapped reads // Ratio of target reads against mapped reads.

[Target] Target Data(Mb) // Total bases covered target region. If a read covered target region partly, only the covered bases will be counted.

[Target] Target Data Rmdup(Mb) // Total bases covered target region after remove PCR duplications.

[Target] Fraction of Target Data in all data // Ratio of target bases against raw bases.

[Target] Fraction of Target Data in mapped data // Ratio of target bases against mapped bases.

[Target] Len of region // The length of target regions.

[Target] Average depth // Average depth of target regions. Calculated by "target bases / length of regions".

[Target] Average depth(rmdup) // Average depth of target regions after remove PCR duplications.

[Target] Coverage (>0x) // Ratio of bases with depth greater than 0x in target regions, which also means the ratio of covered regions in target regions.

[Target] Coverage (>=4x) // Ratio of bases with depth greater than or equal to 4x in target regions.

[Target] Coverage (>=10x) // Ratio of bases with depth greater than or equal to 10x in target regions.

[Target] Coverage (>=30x) // Ratio of bases with depth greater than or equal to 30x in target regions.

[Target] Coverage (>=100x) // Ratio of bases with depth greater than or equal to 100x in target regions.

[Target] Coverage (>=Nx) // This is addtional line for user self-defined cutoff value, see --cutoffdepth

[Target] Target Region Count // Number of target regions. In normal practise,it is the total number of exomes.

[Target] Region covered > 0x // The number of these regions with average depth greater than 0x.

[Target] Fraction Region covered > 0x // Ratio of these regions with average depth greater than 0x.

[Target] Fraction Region covered >= 4x // Ratio of these regions with average depth greater than or equal to 4x.

[Target] Fraction Region covered >= 10x // Ratio of these regions with average depth greater than or equal to 10x.

[Target] Fraction Region covered >= 30x // Ratio of these regions with average depth greater than or equal to 30x.

[Target] Fraction Region covered >= 100x // Ratio of these regions with average depth greater than or equal to 100x.

[flank] flank size // The flank size will be count. 200 bp in default. Oligos could also capture the nearby regions of target regions.

[flank] Len of region (not include target region) // The length of flank regions (target regions will not be count).

[flank] Average depth // Average depth of flank regions.

[flank] flank Reads // The total number of reads covered the flank regions. Note: some reads covered the edge of target regions, will be count in flank regions also.

[flank] Fraction of flank Reads in all reads // Ratio of reads covered in flank regions against raw reads.

[flank] Fraction of flank Reads in mapped reads // Ration of reads covered in flank regions against mapped reads.

[flank] flank Data(Mb) // Total bases in the flank regions.

[flank] Fraction of flank Data in all data // Ratio of total bases in the flank regions against raw data.

[flank] Fraction of flank Data in mapped data // Ratio of total bases in the flank regions against mapped data.

[flank] Coverage (>0x) // Ratio of flank bases with depth greater than 0x.

[flank] Coverage (>=4x) // Ratio of flank bases with depth greater than or equal to 4x.

[flank] Coverage (>=10x) // Ratio of flank bases with depth greater than or equal to 10x.

[flank] Coverage (>=30x) // Ratio of flank bases with depth greater than or equal to 30x.

bam文件测序深度统计-bamdst的更多相关文章

SAMTOOLS使用 SAM BAM文件处理
[怪毛匠子整理] samtools学习及使用范例,以及官方文档详解 #第一步:把sam文件转换成bam文件,我们得到map.bam文件 system"samtools view -bS m ...
Pysam 处理bam文件
Pysam可用来处理bam文件安装: 用 pip 或者 conda即可使用: Pysam的函数有很多,主要的读取函数有: AlignmentFile:读取BAM/CRAM/SAM文件 Varian ...
SAM/BAM文件处理
当测序得到的fastq文件map到基因组之后,我们通常会得到一个sam或者bam为扩展名的文件.SAM的全称是sequence alignment/map format.而BAM就是SAM的二进制文件 ...
bam文件softclip ， hardclip ，markduplicate的探究
测序产生的bam文件,有一些reads在cigar值里显示存在softclip,有一些存在hardclip,究竟softclip和hardclip是怎么判断出来的,还有是怎么标记duplicate ...
C++使用htslib库读入和写出bam文件
有时候我们需要使用C++处理bam文件,比如取出read1或者read2等符合特定条件的序列,根据cigar值对序列指定位置的碱基进行统计或者对序列进行处理并输出等,这时我们可以使用htslib库 ...
文件格式——Sam&bam文件
Sam&bam文件 SAM是一种序列比对格式标准, 由sanger制定,是以TAB为分割符的文本格式.主要应用于测序序列mapping到基因组上的结果表示,当然也可以表示任意的多重比对结果.当 ...
测序深度和覆盖度（Sequencing depth and coverage）
总是跑数据,却对数据一无所知,这说不过去吧. 看几篇文章吧 Sequencing depth and coverage: key considerations in genomic analyses( ...
键盘录入一个文件夹路径,统计该文件夹(包含子文件夹)中每种类型的文件及个数,注意:用文件类型(后缀名,不包含.(点),如："java","txt")作为key, 用个数作为value,放入到map集合中,遍历map集合
package cn.it.zuoye5; import java.io.File;import java.util.HashMap;import java.util.Iterator;import ...
java基础 File与递归练习使用文件过滤器筛选将指定文件夹下的小于200K的小文件获取并打印按层次打印(包括所有子文件夹的文件) 多层文件夹情况统计文件和文件夹的数量统计已知类型的数量未知类型的数量
package com.swift.kuozhan; import java.io.File; import java.io.FileFilter; /*使用文件过滤器筛选将指定文件夹下的小于200K ...

随机推荐

Tomcat 服务器体系结构
connector 监听端口,监听到以后,交给 Engine 引擎处理,引擎会根据请求找到对应的主机,找到主机后再去找对应的应用. 如果我们将 port 改为 80,那访问的时候就不用输入端口号,因 ...
sql字段为datetime,插入''的时候默认为1900年
Microsoft SQL Server Database Engine 用两个 4 字节的整数内部存储 datetime 数据类型的值. 第一个 4 字节存储“基础日期”(即 1900 年 1 月 ...
ng-repeat 指令（带有数组）
<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content=&q ...
ECMAScript6
ECMAScript6介绍 # ECMAScript 和 JavaScript 的关系是,前者是后者的规格,后者是前者的一种实现 # 有利于保证这门语言的开放性和中立性. # 标准在每年的 6 月份正 ...
JNI 和 socket api
1.JavaVM 和 JNIEnvJNIEnv是一个与线程相关的变量,不同线程的JNIEnv彼此独立.JavaVM是虚拟机在JNI层的代表,在一个虚拟机进程中只有一个JavaVM,因此该进程的所有线程 ...
java中静态代理和动态代理
一.概述代理是一种模式,提供了对目标对象的间接访问方式,即通过代理访问目标对象.如此便于在目标实现的基础上增加额外的功能操作,前拦截,后拦截等,以满足自身的业务需求,同时代理模式便于扩展目标对象功能 ...
Maven 运行启动时****找不到符号*com.xxx.user.java
Maven 运行启动时****找不到符号*com.xxx.user.java maven项目更改后没有安装 (install) 重新安装解决问题!
js | javascript获取和设置元素的属性
获取和设置元素的内容: var nv = document.getElementById("pid"); alert(nv.innerHTML); nv.innerHTML=&qu ...
访问本地方站出现EOF的分析和解决
每天早晨打开电脑运行本地项目的时候,有时候浏览器上会出现EOF 之前都都能正常访问,所以我猜想本地的项目本身肯定是没有问题的. Google了下,发现有人说是代理的问题,于是关闭代理试过后,发现可以访 ...
struts2的token interceptor
关于struts2的token拦截器的说明原理:struts2的token interceptor是关于重复提交的拦截器,其实现是:在form表单中加入token标签,如下: <form ac ...

bam文件测序深度统计-bamdst

bam文件测序深度统计-bamdst的更多相关文章

随机推荐

热门专题