tophat cufflinks cuffcompare cuffmerge 的使用
Cole Trapnell said:
there are three strategies:
1) merge bams and assemble in a single run of Cufflinks
2) assemble each bam and cuffcompare them to get a combined.gtf
3) assemble each bam and cuffmerge them to get a merged.gtf
All three options work a little differently depending on whether you're also trying to integrate reference transcripts from UCSC or another annotation source.
#1 is quite different from #2 and #3, so I'll discuss its pros and cons first. The advantage here is simplicity of workflow. It's one Cufflinks run, so no need to worry about the details of the other programs. As turnersd mentions, you might also think this maximizes the accuracy of the resulting assembly, and that might be the case, but it also might not (for technical reasons that I don't want to get into right now). The disadvantage of this approach is that your computer might not be powerful enough to run it. More data and more isoforms means substantially more memory and running time. I haven't actually tried this on something like the human body map, but I would be very impressed and surprised if Cufflinks can deal with all of that on a machine owned by mere mortals.
#2 and #3 are very similar - both are designed to gracefully merge full-length and partial transcript assemblies without ever merging transfrags that disagree on splicing structure. Consider two transfrags, A and B, each with a couple exons. If A and B overlap, and they don't disagree on splicing structure, we can (and according to Cufflinks' assembly philosophy, we should) merge them. The difference between Cuffcompare and Cuffmerge is that Cuffcompare will only merge them if A is "contained" in B, or vice versa. That is, only if one of the transfrags is essentially redundant. Otherwise, they both get included. Cuffmerge on the other hand, will merge them if they overlap, and agree on splicing, and are in the same orientiation. As turnersd noted, this is done by converting the transfrags into SAM alignments and running Cufflinks on them.
The other thing that distinguishes these two options is how they deal with a reference annotation. You can read on our website how the Cufflinks Reference Annotation Based Transcript assembler (RABT) works. Cuffcompare doesn't do any RABT assembly, it just includes the reference annotation in the combined.gtf and discards partial transfrags that are contained and compatible with the reference. Cuffmerge actually runs RABT when you provide a reference, and this happens during the step where transfrags are converted into SAM alignments and assembled. We do this to improve quantification accuracy and reduce errors downstream. I should also say that Cuffmerge runs cuffcompare in order annotate the merged assembly with certain helpful features for use later on.
So we recommend #3 for a number of reasons, because it is the closest in spirit to #1 while still being reasonably fast. For reasons that I don't want to get into here (pretty arcane details about the Cufflinks assembler) I also feel that option #3 is actually the most accurate in most experimental settings.
https://www.biostars.org/p/15693/
https://www.biostars.org/p/160808/
http://seqanswers.com/forums/showthread.php?t=16422
https://www.biostars.org/p/139186/
https://www.biostars.org/p/10219/
https://www.biostars.org/p/138521/
http://www.broadinstitute.org/cancer/software/genepattern/rna-seq-analysis
tophat cufflinks cuffcompare cuffmerge 的使用的更多相关文章
- 使用Tophat+cufflinks分析差异表达
使用Tophat+cufflinks分析差异表达 2017-06-15 19:09:43 522 0 0 使用TopHat+Cufflinks的流程图 序列的比对是RNA分析 ...
- SAGE|DNA微阵列|RNA-seq|lncRNA|scripture|tophat|cufflinks|NONCODE|MA|LOWESS|qualitile归一化|permutation test|SAM|FDR|The Bonferroni|Tukey's|BH|FWER|Holm's step-down|q-value|
生物信息学-基因表达分析 为了丰富中心法则,研究人员使用不断更新的技术研究lncRNA的方方面面,其中技术主要是生物学上的微阵列芯片技术和表达数据分析方法,方方面面是指lncRNA的位置特征. Bac ...
- 链终止法|边合成边测序|Bowtie|TopHat|Cufflinks|RPKM|FASTX-Toolkit|fastaQC|基因芯片|桥式扩增|
生物信息学 Sanger采用链终止法进行测序 带有荧光基团的ddXTP+其他四种普通的脱氧核苷酸放入同一个培养皿中,例如带有荧光基团的ddATP+普通的脱氧核苷酸A.T.C.G放入同一个培养皿,以此类 ...
- 转录组的组装Stingtie和Cufflinks
转录组的组装Stingtie和Cufflinks Posted: 十月 18, 2017 Under: Transcriptomics By Kai no Comments 首先这两款软件都是用 ...
- HISAT,sTRINGTIE,ballgown三款RNA-seq信息分析软件
HISAT,sTRINGTIE,ballgown三款RNA-seq信息分析软件 2015年04月02日 11:35:47 夜丘 阅读数:8940 标签: 生物 更多 个人分类: 论文笔记 Bowt ...
- GFF format
后记: ************************************************************************ 在使用cufflinks和cuffmerge中 ...
- RNA-seq数据综合分析教程 AKAP95
https://blog.csdn.net/l_yivs?t=1 RNA-seq数据综合分析教程 2 4,055 A+ 所属分类:Transcriptomics 收 藏 2 RNA-se ...
- Directional RNA-seq data -which parameters to choose?
Directional RNA-seq data -which parameters to choose? REF: https://chipster.csc.fi/manual/library-ty ...
- Next generation sequencing (NGS)二代测序数据预处理与分析
二代测序原理: 1.DNA待测文库构建. 超声波把DNA打断成小片段,一般200--500bp,两端加上不同的接头2.Flowcell.一个flowcell,8个channel,很多接头3.桥式PCR ...
随机推荐
- 【fedora】设置fedora系统
1.安装自动选择最快源的插件fastestmirror: #sudo yum -y install axel yum-plugin-fastestmirror && sudo yum ...
- CSS Reset / Normalize 如何进行样式重置
CSS Reset 过于激进,所有样式全部消除没有必要. 关键是保持各种浏览器的兼容,包括Bootstrap的CSS Reset也是走的这个路线. 线面这个就是后面一种思路的成果: http://ne ...
- android 项目学习随笔二十(屏幕适配)
1.图片适配 放入相同名称的资源文件,机器根据不同分辨率找相近的资源 240*320 ldpi 320*480 mdpi 480*800 hdpi 720*1280 xhdpi 2.布局适配 在不同的 ...
- Spring 复习笔记01
Spring 框架 1. core:整个Spring框架构建在Core核心模块上,它是整个框架的的基础. 2. AOP:AOP模块提供了一个轻便但功能强大强大的AOP框架,让我们可以以AOP的形式增强 ...
- Sed文本替换一例
使用 Sed 完成文本替换操作任务是非常合适的. 现在, 假设我要将一个原有 Java 项目中的一些包及下面的类移到另一个项目中复用. Project javastudy: Packages: alg ...
- JSP01
<%@page pageEncoding="UTF-8" //page:设置此文件的编码 contentType="text/html;charset=utf ...
- Mac OX 隐藏文件夹,文件,应用,磁盘的2种方法 hide finder folder, file, application, volume in 2 ways
经常需要主目录下隐藏一些文件夹之类的, 第一想到的当然就是:在要隐藏的文件夹前面加『.』(leading dot),这个用法当然可以的了 用习惯了Linux/GNU系统的,基本习惯使用这种办法 但是, ...
- linux 常见操作命令
1.网络查询和配置 查询网卡和配置信息:ifconfig 查询指定网卡信息:ifconfig eth1 配置网卡ip信息:vi /etc/sysconfig/network-scripts/ifcfg ...
- Server.MapPath()获取本机绝对路径
1. Server.MapPath("/") 应用程序根目录所在的位置 如 C:\Inetpub\wwwroot\ 2.Server.MapPath("./&qu ...
- MySQL数据很大的时候
众所周知,mysql在数据量很大的时候查询的效率是很低的,因为假如你需要 OFFSET 100000 LIMIT 5 这样的数据,数据库就需要跳过前100000条数据,才能返回给你你需要的5条数据.由 ...