7、purge_haplogs 基因组去冗余
1、下载安装 https://bitbucket.org/mroachawri/purge_haplotigs/wiki/Install
1、Dependencies (in no particular order)
bedtools
$ sudo apt install bedtools
$ bedtools --version
bedtools v2.26.0
samtools
$ sudo apt install samtools
$ samtools --version
samtools 1.7
Using htslib 1.7-2
Copyright (C) 2018 Genome Research Ltd.
Rscript
$ sudo apt install r-base r-base-dev # on a new install we wont have the required R library 'ggplot2' installed
$ sudo su - -c "R -e \"install.packages('ggplot2', repos='http://cran.rstudio.com/')\""
Minimap2
# download the latest release from https://github.com/lh3/minimap2/releases (currently v2.13)
$ wget https://github.com/lh3/minimap2/releases/download/v2.13/minimap2-2.13_x64-linux.tar.bz2
$ tar xf minimap2-2.13_x64-linux.tar.bz2 # we'll add a bin directory to the home folder and add to the PATH, then install there
$ mkdir ~/bin
$ printf "export PATH=\$PATH:~/bin\n" > .bashrc
$ source .bashrc
$ cp minimap2-2.13_x64-linux/minimap2 ~/bin/ $ minimap2 -V
2.13-r850
MUMmer
# download the latest release from https://github.com/mummer4/mummer/releases (currently 4.0.0.beta2)
$ wget https://github.com/mummer4/mummer/releases/download/v4.0.0beta2/mummer-4.0.0beta2.tar.gz
$ tar xf mummer-4.0.0beta2.tar.gz # compile
$ cd mummer-4.0.0beta2
$ ./configure
$ make
$ cd ../ # install (just softlink to the home bin directory ~/bin)
$ ln -s ~/mummer-4.0.0beta2/delta-filter ~/bin/delta-filter
$ ln -s ~/mummer-4.0.0beta2/nucmer ~/bin/nucmer
$ ln -s ~/mummer-4.0.0beta2/show-coords ~/bin/show-coords $ nucmer -V
4.0.0beta2
2、Install Purge Haplotigs
installing to user's home directory, no compiling, just add the purge_haplotigs/bin directory to the system PATH.
# clone the git
$ git clone https://bitbucket.org/mroachawri/purge_haplotigs.git # create a softlink to ~/bin
$ ln -s ~/purge_haplotigs/bin/purge_haplotigs ~/bin/purge_haplotigs # test Purge Haplotigs
$ purge_haplotigs USAGE:
purge_haplotigs <command> [options] COMMANDS:
-- Purge Haplotigs pipeline:
readhist First step, generate a read-depth histogram for the genome
contigcov Second step, get contig coverage stats and flag 'suspect' contigs
purge Third step, identify and reassign haplotigs -- Other scripts
ncbiplace Generate a placement file for submission to NCBI
test Test everything! # test the pipeline
$ purge_haplotigs test
# <lots of jargon>
ALL TESTS PASSED
3、Running Purge Haplotigs(https://www.jianshu.com/p/8ed5b494b131)
PREPARATION
minimap2 -t 4 -ax map-pb genome.fa subreads.fasta.gz --secondary=no \
| samtools sort -@ 8 -m 1G -o aligned.bam -T tmp.ali
STEP 1
Generate a coverage histogram by running the first script. This script will produce a histogram png image file for you to look at and a BEDTools 'genomecov' output file that you'll need for STEP 2.
purge_haplotigs hist -b aligned.bam -g genome.fasta [ -t threads ]
STEP 2
Run the second script using the cutoffs from the previous step to analyse the coverage on a contig by contig basis. This script produces a contig coverage stats csv file with suspect contigs flagged for further analysis or removal.
purge_haplotigs cov -i aligned.bam.genecov -l <integer> -m <integer> -h <integer> \
[-o coverage_stats.csv -j 80 -s 80 ]
STEP 3
Run the purging pipeline. This script will automatically run a BEDTools windowed coverage analysis (if generating dotplots), and minimap2 alignments to assess which contigs to reassign and which to keep. The pipeline will make several iterations of purging. Optionally, parse repeats -r in BED format for improved handling of repetitive regions
purge_haplotigs purge -g genome.fasta -c coverage_stats.csv
You will have five files
- <prefix>.fasta: These are the curated primary contigs
- <prefix>.haplotigs.fasta: These are all the haplotigs identified in the initial input assembly.
- <prefix>.artefacts.fasta: These are the very low/high coverage contigs (identified in STEP 2). NOTE: you'll probably have mitochondrial/chloroplast/etc. contigs in here with the assembly junk.
- <prefix>.reassignments.tsv: These are all the reassignments that were made, as well as the suspect contigs that weren't reassigned.
- <prefix>.contig_associations.log: This shows the contig "associations" e.g
7、purge_haplogs 基因组去冗余的更多相关文章
- 扩增子分析解读4去嵌合体 非细菌序列 生成代表性序列和OTU表
本节课程,需要先完成 扩增子分析解读1质控 实验设计 双端序列合并 2提取barcode 质控及样品拆分 切除扩增引物 3格式转换 去冗余 聚类 先看一下扩增子分析的整体流程,从下向上逐层分析 分 ...
- 28、cd-hit去除冗余序列
转载:http://blog.sina.com.cn/s/blog_670445240101nidy.html 网址:http://cd-hit.org :http://www.bioinformat ...
- cd-hit 去除冗余序列
最近一篇NG中使用到的软件,用来去除冗余的contigs,现简单记录. CD-HIT早先是一个蛋白聚类的软件,其主要的特定就是快!(ps:不是所有快的都是好的) 其去除冗余序列的大概思路就是: 首先对 ...
- FPKM与RPKM
FPKM与RPKM (2015-01-09 23:55:17) 转载▼ 标签: 转载 原文地址:FPKM与RPKM作者:Fiona_72965 定义: FPKM:Fragment Per Kil ...
- KEGG Pathway Anonatation
转载于 Original 2017-06-20 liuhui 生信百科 KEGG 数据库中,把功能相似的蛋白质归为同一组,然后标上 KO 号.通过相似性比对,可以为未知功能的蛋白序列注释上 KO 号. ...
- 扩增子分析解读5物种注释 OTU表操作
本节课程,需要先完成<扩增子分析解读>系列之前的操作 1质控 实验设计 双端序列合并 2提取barcode 质控及样品拆分 切除扩增引物 3格式转换 去冗余 聚类 4去嵌合体 非细菌序列 ...
- KEGG注释
在 KEGG 数据库中,把功能相似的蛋白质归为同一组,然后标上 KO 号.通过相似性比对,可以为未知功能的蛋白序列注释上 KO 号. 截止到 2015 年 6 月 12 日,KEGG 数据库中共收录了 ...
- 【百奥云GS专栏】全基因组选择之工具篇
目录 1. 免费开源包/库 1.1 R包 1.2 Python库 2. 成熟软件 3. WEB/GUI工具 前面我们已经介绍了基因组选择的各类模型,今天主要来了解一下做GS有哪些可用的软件和工具.基因 ...
- mysql中的多行查询结果合并成一个
SELECT GROUP_CONCAT(md.data1) FROM DATA md,contacts cc WHERE md.conskey=cc.id AND md.mimetype_id= 5 ...
随机推荐
- 【NOIP2016提高A组8.12】奇袭
题目 由于各种原因,桐人现在被困在Under World(以下简称UW)中,而UW马上要迎来最终的压力测试--魔界入侵. 唯一一个神一般存在的Administrator被消灭了,靠原本的整合骑士的力量 ...
- 【leetcode】1155. Number of Dice Rolls With Target Sum
题目如下: You have d dice, and each die has f faces numbered 1, 2, ..., f. Return the number of possible ...
- kohana附件上传
try { $upload = Uploader::factory('Picture', $_FILES['Filedata'])->execute();}catch (Exception $e ...
- 给字体和元素加阴影text-shadow和box-shadow
1.语法: 对象选择器 {text-shadow:X轴偏移量 Y轴偏移量 阴影模糊半径 阴影颜色} 注:text-shadow可以使用一个或多个投影,如果使用多个投影时必须需要用逗号“,”分开. 2 ...
- 2014ACM-ICPC广州站题解(摘自闭幕式)
第39届ACM-ICPC亚洲区广州站题解 Ltysky摘抄自闭幕式题目分析 Problem A 满足px+qy=c的点(x,y)在一条直线上,而c的值由直线的截距确定,所以最大化c,就要在糖果(x,y ...
- php中钩子(hook)的应用示例demo
我们先来回顾下原本的开发流程:产品汪搞出了一堆需求:当用户注册成功后需要发送短信.发送邮件等等:然后聪明机智勇敢的程序猿们就一扑而上:把这些需求转换成代码扔在 用户注册成功 和 跳转到首页 之间: 没 ...
- python3下multiprocessing、threading和gevent性能对比----暨进程池、线程池和协程池性能对比
python3下multiprocessing.threading和gevent性能对比----暨进程池.线程池和协程池性能对比 标签: python3 / 线程池 / multiprocessi ...
- pycharm中git配置(coding.net为例)
1.在coding.net注册一个账号 2.登陆coding.net 3.新建项目->输入项目名称.项目描述->初始化仓库选择readme.md并且添加一个appachev2的开源许可证- ...
- 阶段1 语言基础+高级_1-3-Java语言高级_04-集合_06 Set集合_3_HashSet集合存储数据的结构
特点就是查询速度快 jdk1.8以后,如果链表的长度超过了8位.就会把转成红黑树,也是为了提高查询的效率
- 阶段1 语言基础+高级_1-3-Java语言高级_06-File类与IO流_02 递归_1_递归概念&分类&注意事项
a方法里面调用自己,但是没有停止的条件 方法没有停止的条件. 栈内存溢出的异常. 只有栈,没有堆内存 先执行main方法压栈执行 main方法里面调用a方法.a方法就会压栈 改成20000