Pacbio 纯三代组装复活草基因组

对于植物等真核生物基因组来说，重复序列，多倍体，高杂合度等特征在利用二代数据进行组装的时候都会有很大的问题；

利用二代数据组装出来的基因组，大多达不到完成图的水准，通常只是覆盖到编码蛋白的基因区域，还是会有很多的区域覆盖不到，而这些区域正是发挥调控功能的非编码基因区域，近年来，非编码功能的研究越来越多，如果拼接出来的基因组上缺少这部分序列，无法进行后续的研究；

而且由于测序读长的限制和拼接算法的原因，对于重复序列，GC异常区域，会存在组装错误，甚至组装不出来；

三代测序，其长读长和无GC偏好性等特点，降级了基因组组装时的难度，可以组装出在二代数据中很难组装出来的重复序列和GC异常序列，非常适合做基因组的组装；

研究人员利用PacbBio RSII 测序平台对复活草进行测序，使用了32个SMRT cells，测序深度72X

最终组装出来的结果包含650条contigs, 覆盖度为99%(估计的基因组大小为245Mb, contig的总长度为244Mb),conig的N50长度为2.4M，

同时还组装出来完整的叶绿体基因组，大小为125,324 bp, 其中有大约25kb为重复序列，

分析使用的是HGAP的组装流程，参数如下：

The Oropetium genome was assembled using the
RS_HGAP_Assembly.3 protocol for assembly and Quiver for genome polishing in SMRT Analysis v2.3.012. This consisted of a three-step process involving
(1) generation of preassembled reads with improved consensus accuracy;
(2) assembly of the genome through overlap consensus accuracy using Celera; and
(3) one round of genome polishing with Quiver.

For HGAP, the following parameters were used:
PreAssembler Filter v1 (
minimum sub-read length= 3,000 bp,
minimum polymerase read quality = 0.80,
minimum polymerase read length= 3,000bp
);
PreAssembler v2 (
minimum seed length= 16,000 bp,
numberof seed read chunks= 6,
alignment candidates per chunk= 10,
total alignment candidates= 24,
min coverage for correction= 6
);

AssembleUnitig v1 (
target genome coverage= 30,
overlap error rate= 0.06,
minimum overlap= 40 bp,
overlap k-mer= 14
);

BLASR v1 mapping of reads for genome polishing with Quiver (
max divergence percentage= 30,
minimum anchor size= 12).

A second round of genome polishing was performed using Quiver (SMRT Analysis v2.3.0) to
further improve the site-specific consensus accuracy of the assembly.
The following Quiver parameters were used for genome polishing:
filtering (
minimum sub-read length= 3,000 bp,
minimum polymerase read quality= 0.80,
minimum polymerase read length= 3,000 bp);

mapping (
maximum divergence percentage= 30,
minimum anchor size= 12).

Default parameters were otherwise employed for both HGAP assembly and Quiver protocols

Pacbio 纯三代组装复活草基因组的更多相关文章

安装三代组装canu、smartdenovo、wtdbg及矫正软件Racon、Nanopolish的安装
1)三代组装软件 ------------------------------------------------------------------canu--------------------- ...
纯脚本组装Json格式字符串
var answerStr = "["; for (var i in answer) { var data = $("input[name=QuestionItem_&q ...
人类基因组三代组装： cano
git clone https://github.com/marbl/canu.git cd canu/src make -j <number of threads> 使用实例: canu ...
10X Genomics vs. PacBio
10X Genomics已经广泛应用于单细胞测序.组装领域,现在也是火的不行. 10X Genomics原理通过将来自相同DNA片段(10-100kb)的reads加上相同的barcode,然后在i ...
解析Illumina+PacBio组装策略
解析Illumina+PacBio组装策略 (2016-12-08 13:21:58) 转载▼ 基于Illumina和PacBio平台的“二加三”组装策略,巧妙的融合了PacBio平台超长读长 ...
基因组所三代单分子测序PacBio完成技术升级—超长读长助力基因组学研究
基因组所三代单分子测序PacBio完成技术升级—超长读长助力基因组学研究 2015-09-23 | 作者:所级中心基因组平台张兵 [关闭] 近日,基因组所所级中心基因组平台三代单分子实时测序PacB ...
三代PacBio reads纠错 - 专题
三代纠错的重要性不言而喻,三代的核心优势就是长,唯一的缺点就是错误率高,但好就好在错误是随机分布的,可以通过算法解决,这也就是为什么现在有这么多针对三代开发的纠错工具. 纠错和组装是分不开的,纠错就是 ...
基因组Denovo组装原理、软件、策略及实施
目录 1. 组装算法 1)基于OLC算法 2)基于DBG算法 3)OLC vs DBG 2. 组装软件 3. 组装策略 4. 组装项目实施 1)测序前的准备 2) 测序样品准备 3)测序策略的选择 4 ...
【转】NG：垂枝桦基因组图谱构建（2+3组装）及重测序分析
转自希望组公众号.学习二代+三代组装策略的流程垂枝桦(Betula pendula)是一种速生乔木,能在短短一年时间内开花,木质坚实,可做细工.家具等,经济价值极高.近日,芬兰研究人员对垂枝桦自交系 ...

随机推荐

mysql 权限处理
这是对mysql 业务用户在权限处理中遇到的坑: 之前在新建mysql 实例后会做两件事 1.增加业务库 2.为业务库增加一个与之对应的用户 create database appdb char se ...
[Jobdu] 题目1370：数组中出现次数超过一半的数字
题目描述: 数组中有一个数字出现的次数超过数组长度的一半,请找出这个数字.例如输入一个长度为9的数组{1,2,3,2,2,2,5,4,2}.由于数字2在数组中出现了5次,超过数组长度的一半,因此输出2 ...
用VIM打造C语言编写器
1.先用vim --version命令查看一下都是安装了那些vim特性,以及版本等等情况. vim --version VIM - Vi IMproved 7.4 (2013 Aug 10, comp ...
android 开发 ------- 接口文档规范
1 接口书写的格式: 1 用例图 2 流程图 3 详细的接口: 3.1请求的方式: 包含: server地址 le.gxjinan.com/open/user.php?ac=login ...
How to Acquire or Improve Debugging Skills
http://blogs.msdn.com/b/debuggingtoolbox/archive/2007/06/08/recommended-books-how-to-acquire-or-impr ...
nginx学习之epoll
https://blog.csdn.net/mmshixing/article/details/51848673 首先说一下传统的I/O多路复用select和poll,对比一下和epoll之间的区别: ...
Spark性能优化(1)——序列化、内存、并行度、数据存储格式、Shuffle
序列化背景: 在以下过程中,需要对数据进行序列化: shuffling data时需要通过网络传输数据 RDD序列化到磁盘时性能优化点: Spark默认的序列化类型是Java序列化.Java序列化 ...
FileZilla server windows 2003系统下适用的版本
最新版的FileZilla server 在windows 2003系统下已经不能用了 http://files.cnblogs.com/files/airoot/FileZilla_Server-0 ...
常用的兼容IE和火狐FF等浏览器的js方法(js中ie和火狐的一些差别)
介绍了网页上常用的IE/火狐兼容性该页的做法,并给出了代码,相当实用了.为了方便大家阅读代码,以下以 IE 代替 Internet Explorer,以 MF/FF 代替 Mozzila Firefo ...
Linux 目录容量查询和文件打包，清空
查看使用情况 [root@instance-0yj8cprg ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/vda1 20G 4. ...

Pacbio 纯三代组装复活草基因组

Pacbio 纯三代组装复活草基因组的更多相关文章

随机推荐

热门专题