总是跑数据,却对数据一无所知,这说不过去吧. 看几篇文章吧 Sequencing depth and coverage: key considerations in genomic analyses(只讲二代) Assembly of large genomes using second-generation sequencing(参考文献) Identification of optimum sequencing depth especially for de novo genome asse
1. For the Impatient # Download bwakit (or from <http://sourceforge.net/projects/bio-bwa/files/bwakit/> manually) wget -O- http://sourceforge.net/projects/bio-bwa/files/bwakit/bwakit-0.7.15_x64-linux.tar.bz2/download \ | gzip -dc | tar xf - # Genera
最近从公共数据库下载了一堆bam文件和reference 基因组文件,重新分析外显子流程时,跑出了“Exception in thread "main" picard.PicardException: New reference sequence does not contain a matching contig for NC_007605”这个错误. 源代码是这样的: java -jar picard.jar ReorderSam \ I=original.bam \ O=reor
单分子测序reads(PB)的混合纠错和denovo组装 我们广泛使用的PBcR的原始文章就是这一篇 原文链接:Hybrid error correction and de novo assembly of single-molecule sequencing reads 简介:PBcR里面有一种自纠算法(PacBioToCA),纠错的核心本质就是多重序列比对,为了加快比对速度使用了MHAP算法(MinHash).三代的错误分布不是完全随机的,不要以为错误是均匀分布的!!! 摘要: PB技术可以
链接:Canu FAQ Q: What resources does Canu require for a bacterial genome assembly(细菌基因组组装)? A mammalian(哺乳类) assembly? A: Canu is designed to scale resources(自动测量系统硬件资源) to the system it runs on. It will report if the a system does not meet the minim
Canu Quick Start Canu Quick Start PBcR (老版的canu) CA Canu specializes in(专门从事) assembling PacBio or Oxford Nanopre sequences. Canu will correct the reads, then trim suspicious regions(修剪可疑区域) (such as remaining SMRTbell adapter), then assemble the cor
SOAPdenovo是一个新颖的适用于组装短reads的方法,能组装出类似人类基因组大小的de novo草图. 该软件特地设计用来组装Illumina GA short reads,新的版本减少了在图创建时的内存消耗,解决了contig组装时的重复区域的问题,增加了scaffold组装时的覆盖度和长度,改进了gap closing,更加适用于大型基因组组装. (SOAPdenovo是为了组装大型植物和动物基因组而设计的,同样也适用于组装细菌和真菌,组装大型基因组大小如人类时,可能需要150G内存