链接:Canu FAQ



Q
:
What resources does Canu require for a bacterial genome assembly(细菌基因组组装)?   A mammalian(哺乳类) assembly?
A:

Canu is designed to scale resources(自动测量系统硬件资源) to the system it runs on. It will report if the a system does not meet the minimum requirements for a given genome size.

Typically, a bacterial genome can be assembled in 1-10 cpu hours, depending on coverage (~20 min on 16-cores) and 4GB of ram (8GB is recommended). A mammalian genome (such as human) can be assembled in 10-25K cpu hours, depending on coverage (a grid environment is recommended) and at least one machine with 64GB of ram (128GB is recommended).



Q
:
What parameters should I use for my genome? Sequencing type? (该用什么参数进行组装?)
A:

By default, Canu is designed to be universal(通用) on a large range of PacBio (C2-P6-C4) and Oxford Nanopore (R6-R9) data. You can adjust parameters to increase efficiency for your datatype. For example, for higher coverage PacBio datasets, especially from inbred(同系交配) samples, you can decrease the error rate (errorRate=0.013)(覆盖度足够的话可以降低errorrate,1.3%,从而保证更加精准). For recent Nanopore data (R9) 2D data, you can also decrease the default error rate (errorRate=0.013).

With R7 1D sequencing data, multiple rounds(多轮) of error correction are helpful. This should not be necessary for sequences over 85% identity. You can run just the correction from Canu with the options

-correct corOutCoverage=500 corMinCoverage=0 corMhapSensitivity=high

for 5-10 rounds, supplying the asm.correctedReads.fasta.gz output from round i-1 to round i. Assemble with

-nanopore-corrected <your data> errorRate=0.1 utgGraphDeviation=50


Q
:
How do I run Canu on my SLURM/SGE/PBS/LSF/Torque system? (怎么在集群上运行canu)
A:
Canu will auto-detect and configure itself to submit on most grids. If your grid requires special options (such as a partition on SLURM or an account code on SGE, specify it with gridOptions="<your options list>" which will passed to the sheduler by Canu. If you have a grid system but prefer to run locally, specify useGrid=false (平时一般都是设置为false)


Q
:
My asm.contigs.fasta is empty, why? (得到的contig文件是空的?)
A:

By default, canu will split the final output into three files:

asm.contigs.fasta
Everything which could be assembled and is part of the primary assembly, including both unique and repetitive elements. Each contig has several flags included on the fasta def line:
asm.bubbles.fasta
alternate paths in the graph which could not be merged into the primary assembly.
asm.unassembled.fasta
reads/tigs which could not be incorporated into the primary or bubble assemblies.

It is possible for tigs comprised of multiple reads to end up in asm.unassembled.fasta. The default filtering eliminates(消除了) anything with < 2 reads, shorter than 1000bp, or comprised of mostly a single sequence (>75%). The filtering is controlled by the contigFilter parameter which takes 5 values.

contigFilter
minReads
minLength
singleReadSpan
lowCovSpan
lowCovDepth

The default filtering is 2 1000 0.75 0.75 2. If you are assembling amplified data or viral data, it is possible your assembly will be flagged as unassembled. In those cases, you can turn off the filtering with the parameters

contigFilter="2 1000 1.0 1.0 2"


Q
:
Why is my assembly is missing my favorite short plasmid X?
A:

The first step in Canu is to find high-error overlaps and generate corrected sequences for subsequent assembly. This is currently the fastest step in Canu. By default, only the longest 40X of data (based on the specified genome size) is used for correction. If you have a dataset with uneven coverage or small plasmids, correcting the longest 40X may not give you sufficient coverage of your genome/plasmid. In these cases, you can set

corOutCoverage=1000

Or any large value greater than your total input coverage which will correct and assemble all input data, at the expense of runtime. This option is also recommended for metagenomic datasets where all data is useful for assembly.


Q
:
Why do I get only 30X of corrected data?
A:

By default, only the longest 40X of data (based on the specified genome size) is used for correction. Typically, some reads are trimmed during correction due to being chimeric or having erroneous sequence, resulting in a loss of 20-25% (30X output). You can force correction to be non-lossy by setting(数据全部使用、无损输出)

corMinCoverage=0

In which case the corrected reads output will be the same length as the input data, keeping any high-error unsupported bases. Canu will trim these in downstream steps before assembly.


Q
:
What is the minimum coverage required to run Canu? (最小的覆盖度要求)
A:

We have found that on eukaryotic genomes(真核生物基因组) >=20X typically begins to outperform(胜过) current hybrid methods(混合方法). For low coverage datasets (<=30X) we recommend the following parameters

corMinCoverage=0 errorRate=0.035

For high-coverage datasets (typically >=60X) you can decrease the error rate since the higher number of reads should allow sufficient assembly from only the best subset

errorRate=0.013

However, the above is mainly an optimization for speed and will not affect your assembly continuity.


Q
:
My genome is AT/GC rich, do I need to adjust parameters? (基因组AT或GC含量偏差比较大怎么设置参数?)
A:

On bacterial genomes, typically no(细菌的不需要设置). On repetitive genomes with AT<=25 or 75>=AT (or GC) the sequence biases the Jaccard estimate used by MHAP. In those cases setting

corMaxEvidenceErate=0.15

has been sufficient to correct for the bias in our testing. In general, with high coverage repetitive genomes(高覆盖率重复的基因组) (such as plants) it can be beneficial to set the above parameter as it will eliminate repetitive matches, speed up the assembly, and sometime improve unitigs.

Canu FAQ常见问题的更多相关文章

  1. [译]Selenium Python文档:八、附录:FAQ常见问题

    另外一个FAQ:https://github.com/SeleniumHQ/selenium/wiki/Frequently-Asked-Questions 8.1.怎样使用ChromeDriver ...

  2. 收集Magento FAQ常见问题处理办法

    问题:Magento如何下载? 解答:Magento的英文官方下载地址为:http://www.magentocommerce.com/download 注意:需要注册后才可以下载,而且请下载完整版本 ...

  3. LNMP 常见问题(FAQ)

    常见问题(FAQ)常见问题关键词快速索引 我们为什么需要采用LNMP架构?原因不在重复,请看:关于 LNMP一键安装包支持哪些Linux发行版?目前支持CentOS(RadHat).Debian.Ub ...

  4. 动手实践记录(利用django创建一个博客系统)

    1.添加一个分类的标签,和主表的关系是 外键 class Category(models.Model): """ 分类 """ name = ...

  5. mybase 用户教程

    一.安装.卸载 1.安装 在Mac OS X环境下,可通过打开下载的.dmg文件,再把myBase图标拖到应用程序文件夹即可安装.然后通过双击程序图标运行程序 2.卸载 对于Mac OS X,把myB ...

  6. HTML 5 代码

    <!DOCTYPE html> <html lang="zh-CN"> <head> <meta charset="utf-8& ...

  7. 学习地址(oraclemysqllinux)

    1.安装配置 http://blog.chinaunix.net/uid-27126319-id-3466193.htmlhttp://www.cnblogs.com/gaojun/archive/2 ...

  8. php 连接 mssql 常见的所有问题

    php连接mssql时 ntwdblib.dllPHP连接MSSQL配置和PHP代码演示 收藏 如果实现了PHP和MySQL链接了,PHP和MSSQL的链接其实很简单: 支持MSSQL的本地链接和远程 ...

  9. NSIS使用教程(安装包制作安装文件教程,如何封装打包文件) 中文版

    nsis中文版(Nullsoft Scriptable Install System)是一个专业的开源的可以用来封闭Windows程序的实用工具,是一个开源的 Windows 系统下安装程序制作程序. ...

随机推荐

  1. 静态方法和类成员方法(Python)

    静态方法和成员方法分别在创建时分别被装入Staticmethod 类型和 Classmethod类型的对象中.静态方法的定义没有 self参数,且能够被类本身直接调用,类方法在定义时需要名为 cls的 ...

  2. tensorflow安装

    Ubuntu安装tensorflow先安装python-dev,再安装tensorflow就好了$ sudo apt-get install python-dev$ pip install https ...

  3. shell中cut用法

    cut是一个选取命令,就是将一段数据经过分析,取出我们想要的.一般来说,选取信息通常是针对“行”来进行分析的,并不是整篇信息分析的. (1)其语法格式为:cut  [-bn] [file] 或 cut ...

  4. 插入排序和一点小感悟(c++版)

    很早之前,为了应付数据结构考试.花了一星期多看了数据结构,当时觉得也没什么难的. 过了老久,总算是招报应了,做笔试题发现其实所有理解只是在表面,实际上我并不会实现,确实是这样,学术这东西真没捷径,还是 ...

  5. raid0,raid1,raid10,raid5,raid50,raid6,raid60的功能总结简述

    1,raid0的特性:采用剥离,数据将在几个磁盘上进行分割.数据被分成很多数据块,每一数据块会被写入不同的磁盘.从而, 每一磁盘的工作负荷都得到了降低,这有助于加速数据传输.RAID-0可让磁盘更好地 ...

  6. JS调用本地应用程序

    <html><head><meta http-equiv="Content-Type" content="text/html; charse ...

  7. Debug不崩溃Release版本崩溃的一种原因

    今天有一个工程Debug是正常,Release崩溃,郁闷至极. 研究了一下下午,原因是一个类成员变量没有构造函数中初始化.而Debug版本正好没有问题. 所以定义类成员,一定不能忘记初始化!!

  8. 集成学习(Ensembling Learning)

    集成学习(Ensembling Learning) 标签(空格分隔): 机器学习 Adabost 对于一些弱分类器来说,如何通过组合方法构成一个强分类器.一般的思路是:改变训练数据的概率分布(权值分布 ...

  9. [HDOJ5763]Another Meaning(KMP, DP)

    题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=5763 题意:给定两个字符串a和b,其中a中的字符串如果含有子串b,那么那部分可以被替换成*.问有多少种 ...

  10. Java中静态和非静态的区别

    在网上看到的,感觉还不错,自己笔记下来,以后忘了方便看: 非静态方法是相对于静态方法来说的.静态方法使用static关键字来标示,非静态方法没有此关键字. 他们之间最大的区别在于它们生命周期的不同,静 ...