TopHat
What is TopHat?
TopHat is a program that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie. TopHat runs on Linux and OS X.
How does TopHat find junctions?
TopHat can find splice junctions without a reference annotation. By first mapping RNA-Seq reads to the genome, TopHat identifies potential exons, since many RNA-Seq reads will contiguously align to the genome. Using this initial mapping information, TopHat builds a database of possible splice junctions and then maps the reads against these junctions to confirm them.
Short read sequencing machines can currently produce reads 100bp or longer but many exons are shorter than this so they would be missed in the initial mapping. TopHat solves this problem mainly by splitting all input reads into smaller segments which are then mapped independently. The segment alignments are put back together in a final step of the program to produce the end-to-end read alignments.
TopHat generates its database of possible splice junctions from two sources of evidence. The first and strongest source of evidence for a splice junction is when two segments from the same read (for reads of at least 45bp) are mapped at a certain distance on the same genomic sequence or when an internal segment fails to map - again suggesting that such reads are spanning multiple exons. With this approach, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. The second source is pairings of "coverage islands", which are distinct regions of piled up reads in the initial mapping. Neighboring islands are often spliced together in the transcriptome, so TopHat looks for ways to join these with an intron. We only suggest users use this second option (--coverage-search) for short reads (< 45bp) and with a small number of reads (<= 10 million). This latter option will only report alignments across "GT-AG" introns
TopHat的更多相关文章
- tophat输出结果junction.bed
tophat输出结果junction.bed BED format BED format provides a flexible way to define the data lines ...
- tophat安装
1 依赖软件:bowtie,bowtie2,samtools,boost c++ library 2 建立索引文件: bowtie包括bowtie,bowtie-build, ...
- RNA-seq差异表达基因分析之TopHat篇
RNA-seq差异表达基因分析之TopHat篇 发表于2012 年 10 月 23 日 TopHat是基于Bowtie的将RNA-Seq数据mapping到参考基因组上,从而鉴定可变剪切(exon-e ...
- 机器学习进阶-图像形态学变化-礼帽与黑帽 1.cv2.TOPHAT(礼帽-原始图片-开运算后图片) 2.cv2.BLACKHAT(黑帽 闭运算-原始图片)
1.op = cv2.TOPHAT 礼帽:原始图片-开运算后的图片 2. op=cv2.BLACKHAT 黑帽: 闭运算后的图片-原始图片 礼帽:表示的是原始图像-开运算(先腐蚀再膨胀)以后的图像 ...
- 使用Tophat+cufflinks分析差异表达
使用Tophat+cufflinks分析差异表达 2017-06-15 19:09:43 522 0 0 使用TopHat+Cufflinks的流程图 序列的比对是RNA分析 ...
- RNA-seq 安装 fastaqc,tophat,cuffilnks,hisat2
------------------------------------------ 安装fastqc------------------------------------------------- ...
- tophat的用法
概述:tophat是以bowtie2为核心的一款比对软件. tophat工作分两步: 1.将reads用bowtie比对到参考基因组上. 2.将unmapped-reads打断成更小的fragment ...
- SAGE|DNA微阵列|RNA-seq|lncRNA|scripture|tophat|cufflinks|NONCODE|MA|LOWESS|qualitile归一化|permutation test|SAM|FDR|The Bonferroni|Tukey's|BH|FWER|Holm's step-down|q-value|
生物信息学-基因表达分析 为了丰富中心法则,研究人员使用不断更新的技术研究lncRNA的方方面面,其中技术主要是生物学上的微阵列芯片技术和表达数据分析方法,方方面面是指lncRNA的位置特征. Bac ...
- 链终止法|边合成边测序|Bowtie|TopHat|Cufflinks|RPKM|FASTX-Toolkit|fastaQC|基因芯片|桥式扩增|
生物信息学 Sanger采用链终止法进行测序 带有荧光基团的ddXTP+其他四种普通的脱氧核苷酸放入同一个培养皿中,例如带有荧光基团的ddATP+普通的脱氧核苷酸A.T.C.G放入同一个培养皿,以此类 ...
随机推荐
- Appium学习实践(五)遇到的坑(记录自己工作中遇到的坑以及解决方案,不定时更新)
1.错误截图,有时候测试用例执行错误的话,相对于复杂的log,一张错误截图也许能更明确哪里出的问题(当然有时,截图+log还是最好了) 坑:本来是想测试用例fail的时候捕获异常来执行截图操作,但是由 ...
- Python+selenium自动化脚本编辑过程中遇到的问题和小技巧
应该也不算是问题和技巧,算是实践中学习到的Python,记录下,也不定时更新 1.通过截取url判断 实例: self.assertEqual(self.broswer.current_url[sel ...
- NYOJ 762
容斥原理 http://blog.csdn.net/shiren_bod/article/details/5787722
- sql server 导出表结构到 word
------导出表结构语句1.执行以下查询 SELECT 表名 = case when a.colorder=1 then d.name else '' end, 表说明 ...
- 分享 Ionic 开发 Hybrid App 中遇到的问题以及后期发布 iOS/Android 的方方面面
此篇文章主要整理了最近在使用 Ionic 开发 Hybrid App 过程中遇到的一些疑难点以及后期发布生成 iOS 和 Android 版本过程中的种种问题. 文章目录 Ionic 简介和项目需求介 ...
- linux-ntpdate同步更新时间
Linux服务器运行久时,系统时间就会存在一定的误差,一般情况下可以使用date命令进行时间设置,但在做数据库集群分片等操作时对多台机器的时间差是有要求的,此时就需要使用ntpdate进行时间同步 安 ...
- 解决:编译CM14.1 提示Jack “Out of memory error”错误
Android 7.1编译到33%时出现JDK内存溢出的错误: Out of memory error (version f95d7bdecfceb327f9d201a1348397ed8a84384 ...
- extracting fasta records from a multi-fasta file based on a list using awk
for i in $(cat gene_list) do awk -v RS=">" '($1==a){print ">"$0}' a=$i inp ...
- sublime text 3 3114 注册码
-– BEGIN LICENSE -– Michael Barnes Single User License EA7E-821385 8A353C41 872A0D5C DF9B2950 AFF6F6 ...
- wpf 获取datagrid中模板中控件
//获取name为datagrid中第三列第一行模板的控件 FrameworkElement item = dataGrid.Columns[].GetCellContent(dataGrid.Ite ...