转载:http://pacbiofileformats.readthedocs.io/en/5.1/Primer.html

转载:http://pacbiofileformats.readthedocs.io/en/5.1/#legacy-formats

PacBio SMRT sequencing operates within a silicon chip (a SMRTcell) fabricated to contain a large number of microscopic holes (ZMWs, or zero-mode waveguides), each assigned a hole number.

Within a ZMW, PacBio SMRT sequencing is performed on a circularized molecule called a SMRTbell. The SMRTbell, depicted below, consists of:

  • the customer’s double-stranded DNA insert (with sequence II, read following the arrow)
  • (optional) double-stranded DNA barcodes (sequences BL,BRBL,BR) used for multiplexing DNA samples. While the barcodes are optional, they must be present at both ends if present at all. Barcodes may or may not besymmetric, where symmetric means BL=BRCRBL=BRRC.
  • SMRTbell adapters (sequences AL,ARAL,AR), each consisting of a double stranded stem and a single-stranded hairpin loop. Adapters may or may not be symmetric, where symmetric means AL=ARAL=AR.

A schematic drawing of a SMRTbell

SMRT sequencing interrogates the incorporated bases in the product strand of a replication reaction. Assuming the sequencing of the template above began at START, the following sequence of bases would be incorporated (where we are using the superscripts C, R, and RC to denote sequence complementation, reversal, and reverse-complementation):

ACLBCLICBCRACRBRRIRBRLACL…ALCBLCICBRCARCBRRIRBLRALC…

(note the identity (xRC)C=xR(xRC)C=xR).

The ZMW read is the full output of the instrument/basecaller upon observing this series of incorporations, subject to errors due to optical and other limitations. Adapter regions and barcode regions are the spans of the ZMW read corresponding to the adapter and barcode DNA. The subreads are the spans of the ZMW read corresponding to the DNA insert.

One complication arises when one considers the possibility that a ZMW might not contain a single sequencing reaction. Indeed it could could contain zero—in which case the ensuing basecalls are a product of background noise—or it could contain more than one, in which case the basecall sequence represents two intercalated reads, effectively appearing as noise. To remove such noisy sequence, the high quality (HQ) region finder in PostPrimary algorithmically detects a maximal interval of the ZMW read where it appears that a single sequencing reaction is taking place. This region is designated the HQ region, and in the standard mode of operation, PostPrimary will only output the subreads detected within the HQ region.

A schematic of the regions designated within a ZMW read

Note

Our coordinate system begins at the first basecall in the ZMW read (deemed base 0)—i.e., it is notrelative to the HQ region. Intervals in PacBio reads are given in end-exclusive (“half-open”) coordinates. This style of coordinate system should be familiar to Python or C++ STL programmers.

BAM everywhere

Unaligned BAM files representing the subreads will be produced natively by the PacBio instrument. The subreads BAM will be the starting point for secondary analysis. In addition, the scraps arising from cutting out adapter and barcode sequences will be retained in a scraps.bam file, to enable reconstruction of HQ regions of the ZMW reads, in case the customer needs to rerun barcode finding with a different option.

The circular consensus tool/workflow (CCS) will take as input an unaligned subreads BAM file and produce an output BAM file containing unaligned consensus reads.

Alignment (mapping) programs take these unaligned BAM files as input and will produce aligned BAM files, faithfully retaining all tags and headers.

4、Brief primer and lexicon for PacBio SMRT sequencing的更多相关文章

  1. cin、cout、cerr、clog------c++ Primer Plus

    cin对象与标准输入流相对应. cout对象与标准输出流相对应. cerr对象与标准错误流相对应,常用于程序错误信息,不缓冲,直接被发送给屏幕. clog对象也对应标准错误流(这点儿和cerr是一样的 ...

  2. c/c++ 函数、常量、指针和数组的关系梳理

    压力才有动力,15年中旬就要准备实习,学习复习学习复习学习复习学习复习……无限循环中,好记性不如烂笔头……从数组开始,为主干. c 的array由一系列的类型相同的元素构成,数组声明包括数组元素个数和 ...

  3. 三代PacBio reads纠错 - 专题

    三代纠错的重要性不言而喻,三代的核心优势就是长,唯一的缺点就是错误率高,但好就好在错误是随机分布的,可以通过算法解决,这也就是为什么现在有这么多针对三代开发的纠错工具. 纠错和组装是分不开的,纠错就是 ...

  4. c++nullptr(空指针常量)、constexpr(常量表达式)

    总述     又来更新了,今天带来的是nullptr空指针常量.constexpr(常量表达式)C++的两个用法.Result result_fun = nullptr;constexpr stati ...

  5. 测序原理 - PacBio技术资料

    手头有一套完整的PacBio技术资料,会慢慢的总结到博客上. 写在前面:PacBio公司主要有两个测序平台一个是RS,一个是最新的Sequel,下面如果没有指明则是在讲RS平台. SMRT测序技术总览 ...

  6. 纠错工具之 - Proovread

    BioInf-Wuerzburg/proovread - Github 主要是来解读 proovread 发表的文章,搞清楚它内在的原理. Proovread,这个工具绝对没有你想的那么简单,它引入了 ...

  7. C​+​+​构​造​函​数​,​复​制​构​造​函​数​和​析​构​函​数​专​题

    链接:http://wenku.baidu.com/view/d9316c0e52ea551810a6872a.html 本文作者:黄邦勇帅本文是学习 C++中的最基本的内容,因此学习 C++就应全部 ...

  8. 【repost】如何学好编程 (精挑细选编程教程,帮助现在在校学生学好编程,让你门找到编程的方向)四个方法总有一个学好编程的方法适合你

    方法(一)编了这么久的程序,一直想找机会总结下其中的心得和方法,但回想我这段编程道路,又很难说清楚,如果按照我走过的所有路来说,显然是不可能的!当我看完了云风的<游戏之旅--编程感悟>和梁 ...

  9. 各种计算机语言的经典书籍(C/C++/Java/C#/VC/VB等)

    1.Java Java编程语言(第三版)-Java四大名著--James Gosling(Java之父) Java编程思想(第2版)--Java四大名著--Bruce Eckel Java编程思想(第 ...

随机推荐

  1. JIRA 的安装和使用

    需要的软件 6.0.3-x32.exe jira_6.x_language_zh_CN.jar jira_crack.zip http://pan.baidu.com/s/1dEbpJc1 (从网盘下 ...

  2. commons-dbcp连接池的使用

    数据库连接池 池参数(所有池参数都有默认值): 初始大小: 最小空闲连接数: 增量:一次创建的最小单位(5个) 最大空闲连接数:12个 最大连接数:20个 最大的等待时间:1000毫秒 四大连接参数: ...

  3. 剑指offer--3.用两个栈实现队列

    快速刷一遍,先捏软柿子 ----------------------------------------------------------------- 时间限制:1秒 空间限制:32768K 热度 ...

  4. C程序设计语言阅读笔记

    预处理器 ->.i  编译器 >.s 汇编器 >.o 链接器  --可执行文件   ------------------ math.h头文件包含各种数学函数的声明,所有函数都返回一个 ...

  5. vector的内存分配机制分析

    该程序初步演示了我对vector在分配内存的时候的理解.可能有误差,随着理解的改变,改代码可以被修改. /* 功能说明: vector的内存分配机制分析. 代码说明: vector所管理的内存地址是连 ...

  6. Too Rich(贪心加搜索)

    个人心得:10月份月赛题目,很low,就过了一道水题而且是把所有猜测都提交才过的.这段时间不知道忙什么去了, 也没怎么刷题感觉自己越来越差,还不如新来的大一学弟呢,别人起码天天刷代码到半夜,比起刚在区 ...

  7. [独孤九剑]Oracle知识点梳理(六)数据库常用对象之Procedure、function、Sequence

    本系列链接导航: [独孤九剑]Oracle知识点梳理(一)表空间.用户 [独孤九剑]Oracle知识点梳理(二)数据库的连接 [独孤九剑]Oracle知识点梳理(三)导入.导出 [独孤九剑]Oracl ...

  8. java向excel写数据

    package pymongo1; import java.io.File;import java.io.IOException;import java.io.OutputStream; import ...

  9. Node中没搞明白require和import,你会被坑的很惨

    ES6标准发布后,module成为标准,标准的使用是以export指令导出接口,以import引入模块,但是在我们一贯的node模块中,我们采用的是CommonJS规范,使用require引入模块,使 ...

  10. [转]Angular移除不必要的$watch之性能优化

    双向绑定是Angular的核心概念之一,它给我们带来了思维方式的转变:不再是DOM驱动,而是以Model为核心,在View中写上声明式标签.然后,Angular就会在后台默默的同步View的变化到Mo ...