samtools 工具
软件地址:
http://www.htslib.org/
功能三大版块 :
- Samtools
- Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format
- BCFtools
- Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
- HTSlib
- A C library for reading/writing high-throughput sequencing data
cd samtools-1.x # and similarly for bcftools and htslib
./configure --prefix=/where/to/install
make
make install
export PATH=/where/to/install/bin:$PATH # for sh or bash users
Mapping
To prepare the reference for mapping you must first index ( BWT:Burrows Wheeler Transform 索引算法,耗时较长,安排好时间)
bwa index <ref.fa>
bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1' <ref.fa> <read1.fa> <read1.fa> > lane.sam
Ensure that the @RG information here is correct as this information is used by later tools. The SM field must be set to the name of the sample being processed, and LB field to the library.
samtools fixmate -O bam <lane.sam> <lane_fixmate.bam>
samtools sort -O bam -o <lane_sorted.bam> -T </tmp/lane_temp> <lane_fixmate.sam>
In order to reduce the number of miscalls of INDELs in your data it is helpful to realign your raw gapped alignment with the Broad’s GATK Realigner.
java -Xmx2g -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R <ref.fa> -I <lane.bam> -o <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf>
java -Xmx4g -jar GenomeAnalysisTK.jar -T IndelRealigner -R <ref.fa> -I <lane.bam> -targetIntervals <lane.intervals> --known <bundle/b38/Mills1000G.b38.vcf> -o <lane_realigned.bam>
BQSR from the Broad’s GATK allows you to reduce the effects of analysis artefacts produced by your sequencing machines. It does this in two steps, the first analyses your data to detect covariates and the second compensates for those covariates by adjusting quality scores.
java -Xmx4g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -R <ref.fa> -knownSites >bundle/b38/dbsnp_142.b38.vcf> -I <lane.bam> -o <lane_recal.table>
java -Xmx2g -jar GenomeAnalysisTK.jar -T PrintReads -R <ref.fa> -I <lane.bam> --BSQR <lane_recal.table> -o <lane_recal.bam>
It is helpful at this point to compile all of the reads from each library together into one BAM, which can be done at the same time as marking PCR and optical duplicates. To identify duplicates we currently recommend the use of either the Picard or biobambam’s mark duplicates tool.
java -Xmx2g -jar MarkDuplicates.jar VALIDATION_STRINGENCY=LENIENT INPUT=<lane_1.bam> INPUT=<lane_2.bam> INPUT=<lane_3.bam> OUTPUT=<library.bam>
Once this is done you can perform another merge step to produce your sample BAM files.
samtools merge <sample.bam> <library1.bam> <library2.bam> <library3.bam>
samtools index <sample.bam>
Variant Calling
To convert your BAM file into genomic positions we first use mpileup to produce a BCF file that contains all of the locations in the genome.
bcftools mpileup -Ou -f <ref.fa> <sample1.bam> <sample2.bam> <sample3.bam> | bcftools call -vmO z -o <study.vcf.gz>
To prepare our VCF for querying we next index it using tabix:
tabix -p vcf <study.vcf.gz>
prepare graphs and statistics to assist you in filtering your variants:
bcftools stats -F <ref.fa> -s - <study.vcf.gz> > <study.vcf.gz.stats>
mkdir plots
plot-vcfstats -p plots/ <study.vcf.gz.stats>
filter data
bcftools filter -O z -o <study_filtered..vcf.gz> -s LOWQUAL -i'%QUAL>10' <study.vcf.gz>
Variant filtration is a subject worthy of an article in itself and the exact filters you will need to use will depend on the purpose of your study and quality and depth of the data used to call the variants.
注: 变异过滤要服从具体研究目的以及数据的质量和深度。
samtools 工具的更多相关文章
- the grave of my scripts
不定期更新.......... 1,fetch_seq.py https://github.com/freemao/AHRD/blob/master/fetch_seq.py 提取出你想要得染色体的某 ...
- snakemake使用小结
首先在linux 里配置conda 下载 wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.3.1-Linu ...
- SAM格式 及 比对工具之 samtools 使用方法
参考资料: SAMtools(官网) SAM Spec v1.4 (SAM格式 说明书) (重要) samtools-1.3.1 使用手册 (SAMtools软件说明书) samtools常用命令详解 ...
- samtools常用命令详解
samtools的说明文档:http://samtools.sourceforge.net/samtools.shtmlsamtools是一个用于操作sam和bam文件的工具合集.包含有许多命令.以下 ...
- 安装生物信息学软件-Samtools
装完Bowtie2,官方文档给出的栗子说可以玩一玩samtools,所以我入个坑 参考这篇http://m.010lm.com/roll/2016/0620/2343389.html Step 1: ...
- samtools常用命令详解(转)
转自:samtools常用命令详解 samtools的说明文档:http://samtools.sourceforge.net/samtools.shtml samtools是一个用于操作sam和ba ...
- 可视化工具之 IGV 使用方法
整合基因组浏览器(IGV)是一种高性能的可视化工具,用来交互式地探索大型综合基因组数据.它支持各种数据类型,包括array-based的和下一代测序的数据和基因注释. IGV这个工具很牛,发了NB: ...
- 推荐一个SAM文件或者bam文件中flag含义解释工具
SAM是Sequence Alignment/Map 的缩写.像bwa等软件序列比对结果都会输出这样的文件.samtools网站上有专门的文档介绍SAM文件.具体地址:http://samtools. ...
- SAMTOOLS使用 SAM BAM文件处理
[怪毛匠子 整理] samtools学习及使用范例,以及官方文档详解 #第一步:把sam文件转换成bam文件,我们得到map.bam文件 system"samtools view -bS m ...
随机推荐
- OpenACC Hello World
▶ 在 windows 10 上搭建 OpenACC 环境,挺麻烦 ● 安装顺序:Visual Studio 2015(PGI 编译器不支持 Visual Studio 2017):CUDA Tool ...
- HTML5 通过文件输入框读取文件为base64文件, 并借助canvas压缩 FileReader, files, drawImage
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <m ...
- leetcode492
public class Solution { public int[] ConstructRectangle(int area) { Dictionary<int, int> dic = ...
- spring-boot + mybatis +pagehelper 使用分页
转自:https://segmentfault.com/a/1190000015668715?utm_medium=referral&utm_source=tuicool 最近自己搭建一个sp ...
- ABAP-ALV报表导出格式恢复初始画面
进入一个ALV表格,想下载数据,一般点清单-->输出-->电子数据表. 会出来一个对话框,可选择导出成各类格式. 在下端有一个“始终使用选定的格式”,一旦勾上,就再也不会弹出选择框了. 以 ...
- tair介绍以及配置
简介 tair 是淘宝自己开发的一个分布式 key/value 存储引擎. tair 分为持久化和非持久化两种使用方式. 非持久化的 tair 可以看成是一个分布式缓存. 持久化的 tair 将数据存 ...
- python判断unicode是否是汉字,数字,英文,或者其他字符
下面这个小工具包含了 判断unicode是否是汉字,数字,英文,或者其他字符. 全角符号转半角符号. unicode字符串归一化等工作. 还有一个能处理多音字的汉字转拼音的程序,还在整理中. #!/u ...
- MongoDB 集合命令
集合命令 创建语法如下 name是要创建的集合的名称 options是一个文档,用于指定集合的配置,选项参数是可选的,所以只需要到指定的集合名称 可以不手动创建集合,向不存在的集合中第一次加入数据 ...
- date.getTime()
Date date = new Date(); System.out.println(date.getTime()); 输出结果是1210745780625 编译时间当时时间大概是2008年5.14好 ...
- 【英宝通Unity4.0公开课学习 】(四)GUI到物理引擎
今天老妈打电话来说和老爸吵架了... 真的是家家都有本难念的经啊.前后帮她分析了个半小时才帮她解开心结...现在想想老爸还是蛮可怜的,连分享的人都木有 讲的GUI都看睡着了...因为想着可以用NGUI ...