1、下载安装 https://bitbucket.org/mroachawri/purge_haplotigs/wiki/Install

1、Dependencies (in no particular order)

bedtools

$ sudo apt install bedtools
$ bedtools --version
bedtools v2.26.0

samtools

$ sudo apt install samtools
$ samtools --version
samtools 1.7
Using htslib 1.7-2
Copyright (C) 2018 Genome Research Ltd.

Rscript

$ sudo apt install r-base r-base-dev

# on a new install we wont have the required R library 'ggplot2' installed
$ sudo su - -c "R -e \"install.packages('ggplot2', repos='http://cran.rstudio.com/')\""

Minimap2

# download the latest release from https://github.com/lh3/minimap2/releases (currently v2.13)
$ wget https://github.com/lh3/minimap2/releases/download/v2.13/minimap2-2.13_x64-linux.tar.bz2
$ tar xf minimap2-2.13_x64-linux.tar.bz2 # we'll add a bin directory to the home folder and add to the PATH, then install there
$ mkdir ~/bin
$ printf "export PATH=\$PATH:~/bin\n" > .bashrc
$ source .bashrc
$ cp minimap2-2.13_x64-linux/minimap2 ~/bin/ $ minimap2 -V
2.13-r850

MUMmer

# download the latest release from https://github.com/mummer4/mummer/releases (currently 4.0.0.beta2)
$ wget https://github.com/mummer4/mummer/releases/download/v4.0.0beta2/mummer-4.0.0beta2.tar.gz
$ tar xf mummer-4.0.0beta2.tar.gz # compile
$ cd mummer-4.0.0beta2
$ ./configure
$ make
$ cd ../ # install (just softlink to the home bin directory ~/bin)
$ ln -s ~/mummer-4.0.0beta2/delta-filter ~/bin/delta-filter
$ ln -s ~/mummer-4.0.0beta2/nucmer ~/bin/nucmer
$ ln -s ~/mummer-4.0.0beta2/show-coords ~/bin/show-coords $ nucmer -V
4.0.0beta2

2、Install Purge Haplotigs

installing to user's home directory, no compiling, just add the purge_haplotigs/bin directory to the system PATH.

# clone the git
$ git clone https://bitbucket.org/mroachawri/purge_haplotigs.git # create a softlink to ~/bin
$ ln -s ~/purge_haplotigs/bin/purge_haplotigs ~/bin/purge_haplotigs # test Purge Haplotigs
$ purge_haplotigs USAGE:
purge_haplotigs <command> [options] COMMANDS:
-- Purge Haplotigs pipeline:
readhist First step, generate a read-depth histogram for the genome
contigcov Second step, get contig coverage stats and flag 'suspect' contigs
purge Third step, identify and reassign haplotigs -- Other scripts
ncbiplace Generate a placement file for submission to NCBI
test Test everything! # test the pipeline
$ purge_haplotigs test
# <lots of jargon>
ALL TESTS PASSED

3、Running Purge Haplotigs(https://www.jianshu.com/p/8ed5b494b131

PREPARATION

minimap2 -t 4 -ax map-pb genome.fa subreads.fasta.gz --secondary=no \
| samtools sort -@ 8 -m 1G -o aligned.bam -T tmp.ali

STEP 1

Generate a coverage histogram by running the first script. This script will produce a histogram png image file for you to look at and a BEDTools 'genomecov' output file that you'll need for STEP 2.

purge_haplotigs  hist  -b aligned.bam  -g genome.fasta  [ -t threads ]

STEP 2

Run the second script using the cutoffs from the previous step to analyse the coverage on a contig by contig basis. This script produces a contig coverage stats csv file with suspect contigs flagged for further analysis or removal.

purge_haplotigs  cov  -i aligned.bam.genecov  -l <integer>  -m <integer>  -h <integer>  \
[-o coverage_stats.csv -j 80 -s 80 ]

STEP 3

Run the purging pipeline. This script will automatically run a BEDTools windowed coverage analysis (if generating dotplots), and minimap2 alignments to assess which contigs to reassign and which to keep. The pipeline will make several iterations of purging. Optionally, parse repeats -r in BED format for improved handling of repetitive regions

purge_haplotigs  purge  -g genome.fasta  -c coverage_stats.csv

You will have five files

  • <prefix>.fasta: These are the curated primary contigs
  • <prefix>.haplotigs.fasta: These are all the haplotigs identified in the initial input assembly.
  • <prefix>.artefacts.fasta: These are the very low/high coverage contigs (identified in STEP 2). NOTE: you'll probably have mitochondrial/chloroplast/etc. contigs in here with the assembly junk.
  • <prefix>.reassignments.tsv: These are all the reassignments that were made, as well as the suspect contigs that weren't reassigned.
  • <prefix>.contig_associations.log: This shows the contig "associations" e.g
 

7、purge_haplogs 基因组去冗余的更多相关文章

  1. 扩增子分析解读4去嵌合体 非细菌序列 生成代表性序列和OTU表

    本节课程,需要先完成 扩增子分析解读1质控 实验设计 双端序列合并 2提取barcode 质控及样品拆分 切除扩增引物 3格式转换 去冗余 聚类   先看一下扩增子分析的整体流程,从下向上逐层分析 分 ...

  2. 28、cd-hit去除冗余序列

    转载:http://blog.sina.com.cn/s/blog_670445240101nidy.html 网址:http://cd-hit.org :http://www.bioinformat ...

  3. cd-hit 去除冗余序列

    最近一篇NG中使用到的软件,用来去除冗余的contigs,现简单记录. CD-HIT早先是一个蛋白聚类的软件,其主要的特定就是快!(ps:不是所有快的都是好的) 其去除冗余序列的大概思路就是: 首先对 ...

  4. FPKM与RPKM

    FPKM与RPKM (2015-01-09 23:55:17) 转载▼ 标签: 转载   原文地址:FPKM与RPKM作者:Fiona_72965 定义:  FPKM:Fragment Per Kil ...

  5. KEGG Pathway Anonatation

    转载于 Original 2017-06-20 liuhui 生信百科 KEGG 数据库中,把功能相似的蛋白质归为同一组,然后标上 KO 号.通过相似性比对,可以为未知功能的蛋白序列注释上 KO 号. ...

  6. 扩增子分析解读5物种注释 OTU表操作

    本节课程,需要先完成<扩增子分析解读>系列之前的操作 1质控 实验设计 双端序列合并 2提取barcode 质控及样品拆分 切除扩增引物 3格式转换 去冗余 聚类 4去嵌合体 非细菌序列 ...

  7. KEGG注释

    在 KEGG 数据库中,把功能相似的蛋白质归为同一组,然后标上 KO 号.通过相似性比对,可以为未知功能的蛋白序列注释上 KO 号. 截止到 2015 年 6 月 12 日,KEGG 数据库中共收录了 ...

  8. 【百奥云GS专栏】全基因组选择之工具篇

    目录 1. 免费开源包/库 1.1 R包 1.2 Python库 2. 成熟软件 3. WEB/GUI工具 前面我们已经介绍了基因组选择的各类模型,今天主要来了解一下做GS有哪些可用的软件和工具.基因 ...

  9. mysql中的多行查询结果合并成一个

    SELECT GROUP_CONCAT(md.data1) FROM DATA md,contacts cc WHERE md.conskey=cc.id AND md.mimetype_id= 5 ...

随机推荐

  1. Git整理[1] git cherry-pick的使用

    简单地说 git cherry-pick为”挑拣”提交 ,挑取某次提交合并到其他分支上,而不用合并整个分支. 参数: git cherry-pick [<options>] <com ...

  2. SQL把a表字段数据存到b表字段 update,,insert

    update SYS_Navigation set SYS_Navigation.PARENT_XH = SYS_Power_menu.parent_id,SYS_Navigation.web_tit ...

  3. 超好用json转excel工具

    给大家安利一个超实用的json数据转excel工具:http://www.yzcopen.com/doc/jsonexcel

  4. ckeditor实现WORD粘贴图片自动上传,jsp应用

    官网地址http://ueditor.baidu.com Git 地址 https://github.com/fex-team/ueditor 参考博客地址 http://blog.ncmem.com ...

  5. [NOI2003]逃学的小孩 题解

    前言 >原题传送门(洛谷)< 看了一下洛谷题面,这道NOI的题竟然是蓝的(恶评?),做了一下好像确实是蓝的... 解法 思路非常简单,找道树的直径,然后答案是直径长度加上最大的min(di ...

  6. linux ubantu php composer安装

    root@iZwz93telmwbh624e5zetqZ:~# php -v PHP 5.6.40-14+ubuntu16.04.1+deb.sury.org+1 (cli) Copyright (c ...

  7. 使用vue技术应当使用的技术和兼容性选择

    假如你的前端框架使用了vue,那你可以大胆地使用以下技术,并忽略其他js和css的兼容性问题,因为 关于vue的兼容性 官方给出了规定 Vue 不支持 IE8 及以下版本,因为 Vue 使用了 IE8 ...

  8. jQuery插件simplePagination的使用-踩坑记_03

    jQuery插件simplePagination的使用 正在熟悉项目上的代码,新添加了一个需要,需要对表单进行分页,之前的代码中是有分页的代码的,看了老半天,也没看太明白.之前的项目比较久远,继续熟悉 ...

  9. 冲刺周五——Fifth Day

    #一.Fifth Day照片 #二.今日份燃尽图 #三.项目进展 * 码云团队协同环境构建完毕 * 利用Leangoo制作任务分工及生成燃尽图 * 完成AES加解密部分代码 * 用代码实现对文件的新建 ...

  10. SQL*Plus 与数据库的交互

    设置SQL *Plus的运行环境 SET 命令格式: set system_variable value pagesize :从顶部标题到页结束之间的行数 默认是14 newpage:一页中空行的数量 ...