What is TopHat?

TopHat is a program that aligns RNA-Seq reads to a genome in order to identify exon-exon splice junctions. It is built on the ultrafast short read mapping program Bowtie. TopHat runs on Linux and OS X.

How does TopHat find junctions?

TopHat can find splice junctions without a reference annotation. By first mapping RNA-Seq reads to the genome, TopHat identifies potential exons, since many RNA-Seq reads will contiguously align to the genome. Using this initial mapping information, TopHat builds a database of possible splice junctions and then maps the reads against these junctions to confirm them.

Short read sequencing machines can currently produce reads 100bp or longer but many exons are shorter than this so they would be missed in the initial mapping. TopHat solves this problem mainly by splitting all input reads into smaller segments which are then mapped independently. The segment alignments are put back together in a final step of the program to produce the end-to-end read alignments.

TopHat generates its database of possible splice junctions from two sources of evidence. The first and strongest source of evidence for a splice junction is when two segments from the same read (for reads of at least 45bp) are mapped at a certain distance on the same genomic sequence or when an internal segment fails to map - again suggesting that such reads are spanning multiple exons. With this approach, "GT-AG", "GC-AG" and "AT-AC" introns will be found ab initio. The second source is pairings of "coverage islands", which are distinct regions of piled up reads in the initial mapping. Neighboring islands are often spliced together in the transcriptome, so TopHat looks for ways to join these with an intron. We only suggest users use this second option (--coverage-search)  for short reads (< 45bp) and with a small number of reads (<= 10 million).  This latter option will only report alignments across "GT-AG" introns

TopHat的更多相关文章

  1. tophat输出结果junction.bed

    tophat输出结果junction.bed BED format       BED format provides a flexible way to define the data lines ...

  2. tophat安装

    1     依赖软件:bowtie,bowtie2,samtools,boost c++ library 2     建立索引文件:      bowtie包括bowtie,bowtie-build, ...

  3. RNA-seq差异表达基因分析之TopHat篇

    RNA-seq差异表达基因分析之TopHat篇 发表于2012 年 10 月 23 日 TopHat是基于Bowtie的将RNA-Seq数据mapping到参考基因组上,从而鉴定可变剪切(exon-e ...

  4. 机器学习进阶-图像形态学变化-礼帽与黑帽 1.cv2.TOPHAT(礼帽-原始图片-开运算后图片) 2.cv2.BLACKHAT(黑帽 闭运算-原始图片)

    1.op = cv2.TOPHAT  礼帽:原始图片-开运算后的图片 2. op=cv2.BLACKHAT 黑帽: 闭运算后的图片-原始图片 礼帽:表示的是原始图像-开运算(先腐蚀再膨胀)以后的图像 ...

  5. 使用Tophat+cufflinks分析差异表达

    使用Tophat+cufflinks分析差异表达  2017-06-15 19:09:43     522     0     0 使用TopHat+Cufflinks的流程图 序列的比对是RNA分析 ...

  6. RNA-seq 安装 fastaqc,tophat,cuffilnks,hisat2

    ------------------------------------------ 安装fastqc------------------------------------------------- ...

  7. tophat的用法

    概述:tophat是以bowtie2为核心的一款比对软件. tophat工作分两步: 1.将reads用bowtie比对到参考基因组上. 2.将unmapped-reads打断成更小的fragment ...

  8. SAGE|DNA微阵列|RNA-seq|lncRNA|scripture|tophat|cufflinks|NONCODE|MA|LOWESS|qualitile归一化|permutation test|SAM|FDR|The Bonferroni|Tukey's|BH|FWER|Holm's step-down|q-value|

    生物信息学-基因表达分析 为了丰富中心法则,研究人员使用不断更新的技术研究lncRNA的方方面面,其中技术主要是生物学上的微阵列芯片技术和表达数据分析方法,方方面面是指lncRNA的位置特征. Bac ...

  9. 链终止法|边合成边测序|Bowtie|TopHat|Cufflinks|RPKM|FASTX-Toolkit|fastaQC|基因芯片|桥式扩增|

    生物信息学 Sanger采用链终止法进行测序 带有荧光基团的ddXTP+其他四种普通的脱氧核苷酸放入同一个培养皿中,例如带有荧光基团的ddATP+普通的脱氧核苷酸A.T.C.G放入同一个培养皿,以此类 ...

随机推荐

  1. 国内经典BI系统架构分析

    谈起商业智能BI,也许大家并不陌生,但你是否了解国内的各类BI系统架构? 自国内商业智能发展以来,就系统结构方面已经历了多次优化性的变革.目前国内商业智能BI系统的经典架构的模式包括数据层.业务层和应 ...

  2. Java读带有BOM的UTF-8文件乱码原因及解决方法

    原因: 关于utf-8编码的txt文件,windows以记事本方式保存时会在第一行最开始处自动加入bom格式的相关信息,大概三个字节! 所以java在读取此类文件时第一行时会多出三个不相关的字节,这样 ...

  3. Learn Python The Hard Way ex41中的程序

    import random from urllib import urlopen import sys WORD_URL = "http://learncodethehardway.org/ ...

  4. AJAX提交方法(GET)Demon

    AJAX作为一种异步的Javascript程序执行方法,极大的弥补了HTTP协议的不足(HTTP协议为无状态协议),可以无需加载整个页面,只需加载所需数据即可,浏览器内置的XMLHttp对象有open ...

  5. CentOS 7.1, 7.2 下安装dotnet core

    .NET CORE的官方(http://dotnet.github.io/getting-started/)只提供了Windows, Ubuntu14.04, 及Docker(也是基于Ubuntu14 ...

  6. EF6 在 SQLite中使用备忘

    == 菜鸟级选手试验在EF6中使用Sqlite,零EF基础,少量Sqlite基础.经过断断续续的很长时间 - _ -! >>连接 1. 安装 使用目前最新版本EF6.1,Sqlite1.0 ...

  7. SQL Server 常用关键字

    SQL 建库 建表 --1.创建一个数据库 create database School; --删除数据库 drop database School; --创建数据库的时候指定一些选项. create ...

  8. Bootstrap使用后笔记

      Bootstrap Modal 垂直居中 在 bootstrap.js中修改如下代码: Modal.prototype.adjustDialog = function () { var modal ...

  9. 一个key 在10w k/v 找到对应的. (B-tree), 这10w放进B-tree 会有多少层.

    B-tree是二叉平衡查找树,相邻两层节点层数不超过1 所有10w 即 2^16=65536 < 10w < 2^17=131072: 会有17层,最多查询17次.

  10. Synchronized

    1. 在编写一个类时,如果该类中的代码可能运行与多线程环境下,就要考虑同步问题了. 会同时被多个线程访问的资源,就是竞争资源,也称为竞争条件.对于多线程共享的资源我们必须进行同步,以避免一个线程的改动 ...