Variation calling and annotation
本文摘自《Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean》
Variation calling and annotation.
Mapping.
SAMtools (Version: 0.1.18) software was used to convert mapping results into the BAM format and to filter the unmapped and non-unique reads.
Duplicated reads were filtered with the Picard package (picard.sourceforge.net, Version:1.87).
The BEDtools (Version: 2.17.0) coverageBed program was used to compute the coverage of sequence alignments. (A sequence was defined as absent if coverage was lower than 90% and present if coverage was greater than 90%.)
SNP calling.
SNP detection was performed using the Genome Analysis Toolkit (GATK, version 2.4-7-g5e89f01) and SAMtools. Only the SNPs detected by both methods were analyzed further.
The detailed processes were as follows:
(1) After BWA alignment, the reads around indels were realigned.
Realignment was performed with GATK in two steps.
The first step used the RealignerTargetCreator package to identify regions where realignment was needed;
The second step used IndelRealigner to realign the regions found in the first step, which produced a realigned BAM file for each accession.
(2) SNPs were called at a population level with GATK and SAMtools. For GATK, the SNP confidence score was set as greater than 30, and the parameter -stand_call_conf was set as 30. The same realigned BAM files were used in SNP calling through the SAMtools mpileup package.
(3) In the filter step, we chose the common sites identified by GATK and SAMtools with the SelectVariants package; SNPs with allele frequencies lower than 1% in the population were discarded.
Indel calling.
Indel calling was similar to SNP calling but with the UnifiedGenotyper parameter -glm INDEL for the indel report only. Only insertions and deletions shorter than or equal to 6 bp were taken into account.
Annotation.
SNP annotation was performed according to the genome using the package ANNOVAR (Version: 2013-08-23).
Based on the genome annotation, SNPs were categorized in exonic regions (overlapping with a coding exon), splicing sites (within 2 bp of a splicing junction), 5′UTRs and 3′UTRs, intronic regions (overlapping with an intron), upstream and downstream regions (within a 1 kb region upstream or downstream from the transcription start site), and intergenic regions.SNPs in coding exons were further grouped into synonymous SNPs (did not cause amino acid changes) or nonsynonymous SNPs (caused amino acid changes; mutations causing stop gain and stop loss were also classified into this group).
Indels in the exonic regions were classified by whether they had frame-shift (3 bp insertion or deletion) mutations.
Variation calling and annotation的更多相关文章
- 敏感性、特异性、假阳性、假阴性(sensitivity and specificity)
医学.机器学习等等,在统计结果时时长会用到这两个指标来说明数据的特性. 定义 敏感性:在金标准判断有病(阳性)人群中,检测出阳性的几率.真阳性.(检测出确实有病的能力) 特异性:在金标准判断无病(阴性 ...
- 30、 bowtie和bowtie2使用条件区别及用法
转载:http://blog.csdn.net/soyabean555999/article/details/62235577 一.转录组还是基因组? map常用的工具有bowtie/bowtie2, ...
- 表观 | Enhancer | ChIP-seq | 转录因子 | 数据库专题
需要长期更新! 参考:生信修炼手册 enhancer的基本概念: 长度几十到几千bp,作用是提高靶基因活性,属于顺式作用原件,DNA作用到DNA,转录因子就是反式,是结合到DNA的蛋白. 1981年, ...
- ANNOTATION PROCESSING 101 by Hannes Dorfmann — 10 Jan 2015
原文地址:http://hannesdorfmann.com/annotation-processing/annotationprocessing101 In this blog entry I wo ...
- Spring Annotation Processing: How It Works--转
找的好辛苦呀 原文地址:https://dzone.com/articles/spring-annotation-processing-how-it-works If you see an annot ...
- Microsoft source-code annotation language (SAL) 相关
More info see: https://msdn.microsoft.com/en-us/library/hh916383.aspx Simply stated, SAL is an inexp ...
- Spring 4 Ehcache Configuration Example with @Cacheable Annotation
http://www.concretepage.com/spring-4/spring-4-ehcache-configuration-example-with-cacheable-annotatio ...
- Annotation Type @bean,@Import,@configuration使用--官方文档
@Target(value={METHOD,ANNOTATION_TYPE}) @Retention(value=RUNTIME) @Documented public @interface Bean ...
- Calling convention-调用约定
In computer science, a calling convention is an implementation-level (low-level) scheme for how subr ...
随机推荐
- C++ 基础知识回顾(I/O)
[1] I/O基础 大多数计算机语言的输入输出的实现都是以语言本身为基础的,但是C/C++没有这样做.C语言最初把I/O留给了编译器实现人员.这样做的一个原因是可以提供足够的自由度,使之最适合目标机器 ...
- 160817、Java数据类型以及变量的定义
Java 是一种强类型的语言,声明变量时必须指明数据类型.变量(variable)的值占据一定的内存空间.不同类型的变量占据不同的大小. Java中共有8种基本数据类型,包括4 种整型.2 种浮点型. ...
- 安装mysql报错—解决方法:error while loading shared libraries: libssl.so.6
for 32bit ln -sf /usr/lib/libssl.so.10 /usr/lib/libssl.so.6ln -sf /usr/lib/libcrypto.so.10 /usr/lib/ ...
- Kubectl工具常用命令
创建namesapce kubectl create namespace {name} 注意:name只能为小写字母.数字和-的组合,且开头结尾为字母,一般格式为my-name 123-abc等. 创 ...
- 2018.10.26-day5 python整理总结
今日内容: 1.字典 2.id is == 3.小数据池 4.集合昨日回顾:1.列表:可变的 增:append//insert//extend//+//* 删:remove//pop//clear// ...
- Django 之 权限系统(组件)
参考: http://www.cnblogs.com/yuanchenqi/articles/7609586.html
- 笔记:zookeeper Hello World
下载zookeeper-3.4.6 , 试用了一下 standlone 启动 ./bin/zkServer.sh start 注: Usage: ./bin/zkServer.sh {start|st ...
- argparse 模块 在终端执行脚本文件
1.案例 #1.案例: import argparse #首先导入模块 parser = argparse.ArgumentParser() #创建一个解析对象 parser.add_argument ...
- docker daemon.json 配置
下面是自己设置的 /etc/docker/daemon.json 文件中的配置案例 [root@master docker]# cat daemon.json { "registry-mir ...
- 002-IP地址及分类以及子网掩码
一.概述 IP地址是一个4段2进制码组成的,每一段二进制码有8位,共32位二进制数.占用4个字节. IP地址是指互联网协议地址(Internet Protocol Address,又译为网际协议地址) ...