1)Introduction

DEXSeq是一种在多个比较RNA-seq实验中,检验差异外显子使用情况的方法。 通过差异外显子使用(DEU),我们指的是由实验条件引起的外显子相对使用的变化。 外显子的相对使用定义为:

number of transcripts from the gene that contain this exon / number of all transcripts from the gene

大致思想:. For each exon (or part of an exon) and each sample, we count how many reads map to this exon and how many reads map to any of the other exons of the same gene. We consider the ratio of these two counts, and how it changes across conditions, to infer changes in the relative exon usage

2)安装

if("DEXSeq" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("DEXSeq")}
suppressMessages(library(DEXSeq))
ls('package:DEXSeq')
pythonScriptsDir = system.file( "python_scripts", package="DEXSeq" )
list.files(pythonScriptsDir)
## [1] "dexseq_count.py" "dexseq_prepare_annotation.py" #查看是否含有这两个脚本
python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.72.gtf Dmel.BDGP5.25.62.DEXSeq.chr.gff #GTF转化为GFF with collapsed exon counting bins.
python dexseq_count.py Dmel.BDGP5.25.62.DEXSeq.chr.gff untreated1.sam untreated1fb.txt #count

3) 用自带实验数据集(数据预处理)

suppressMessages(library(pasilla))
inDir = system.file("extdata", package="pasilla")
countFiles = list.files(inDir, pattern="fb.txt$", full.names=TRUE) #countfile(如果不是自带数据集,可以由dexseq_count.py脚本生成)
basename(countFiles)
flattenedFile = list.files(inDir, pattern="gff$", full.names=TRUE)
basename(flattenedFile) #gff文件(如果不是自带数据集,可以由dexseq_prepare_annotation.py脚本生成)
########构造数据框sampleTable,包含sample名字,实验,文库类型等信息#######################
sampleTable = data.frame(
row.names = c( "treated1", "treated2", "treated3",
"untreated1", "untreated2", "untreated3", "untreated4" ),
condition = c("knockdown", "knockdown", "knockdown",
"control", "control", "control", "control" ),
libType = c( "single-end", "paired-end", "paired-end",
"single-end", "single-end", "paired-end", "paired-end" ) )
sampleTable ##############构建 DEXSeqDataSet object#############################
dxd = DEXSeqDataSetFromHTSeq(
countFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile ) #四个参数

4)Standard analysis work-flow

########以下是简单的实验设计#####
genesForSubset = read.table(file.path(inDir, "geneIDsinsubset.txt"),stringsAsFactors=FALSE)[[1]] #基因子集ID  
dxd = dxd[geneIDs( dxd ) %in% genesForSubset,] #取子集,减少运行量
head(colData(dxd))
head( counts(dxd), 5 )
split( seq_len(ncol(dxd)), colData(dxd)$exon )
sampleAnnotation( dxd )
############# dispersion estimates and the size factors#############
dxd = estimateSizeFactors( dxd ) ##Normalisation
dxd = estimateDispersions( dxd )
plotDispEsts( dxd ) #图1 #################Testing for differential exon usage############
dxd = testForDEU( dxd )
dxd = estimateExonFoldChanges( dxd, fitExpToVar="condition")
dxr1 = DEXSeqResults( dxd )
dxr1
mcols(dxr1)$description
table ( dxr1$padj < 0.1 )
table ( tapply( dxr1$padj < 0.1, dxr1$groupID, any ) )
plotMA( dxr1, cex=0.8 ) #图2

To see how the power to detect differential exon usage depends on the number of reads that map to an exon, a so-called MA plot is useful, which plots the logarithm of fold change versus average normalized count per exon and marks by red colour the exons which are considered significant; here, the exons with an adjusted p values of less than 0.1

############以下是更复杂的实验设计##################
formulaFullModel = ~ sample + exon + libType:exon + condition:exon
formulaReducedModel = ~ sample + exon + libType:exon
dxd = estimateDispersions( dxd, formula = formulaFullModel )
dxd = testForDEU( dxd,
reducedModel = formulaReducedModel,
fullModel = formulaFullModel )
dxr2 = DEXSeqResults( dxd )
table( dxr2$padj < 0.1 )
table( before = dxr1$padj < 0.1, now = dxr2$padj < 0.1 )##和简单的实验设计比较

5)Visualization

plotDEXSeq( dxr2, "FBgn0010909", legend=TRUE, cex.axis=1.2, cex=1.3,
lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", displayTranscripts=TRUE, legend=TRUE,
cex.axis=1.2, cex=1.3, lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", expression=FALSE, norCounts=TRUE,
legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", expression=FALSE, splicing=TRUE,
legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
DEXSeqHTML( dxr2, FDR=0.1, color=c("#FF000080", "#0000FF80") )

DEXSeq的更多相关文章

  1. 【转录组入门】6:reads计数

    作业要求: 实现这个功能的软件也很多,还是烦请大家先自己搜索几个教程,入门请统一用htseq-count,对每个样本都会输出一个表达量文件. 需要用脚本合并所有的样本为表达矩阵.参考:生信编程直播第四 ...

  2. Bulk RNA-Seq转录组学习

    与之对应的是single cell RNA-Seq,后面也会有类似文章. 参考:https://github.com/xuzhougeng/Learn-Bioinformatics/ 作业:RNA-s ...

  3. Bioconductor应用领域之基因芯片

    引用自https://mp.weixin.qq.com/s?__biz=MzU4NjU4ODQ2MQ==&mid=2247484662&idx=1&sn=194668553f9 ...

随机推荐

  1. js中replace的用法(两种常用举例,还有好多用法不一一列举)

    1.替换特定字符 <html><body> <script type="text/javascript"> var str="Visi ...

  2. 关于安卓苹果手机安装证书抓https的关键步骤

    苹果有关键步骤!!!

  3. 不重启修改'log_slave_updates'变量

    Variable 'log_slave_updates' is a read only variable 不重启修改mysql变量 执行复制的时候遇到的问题 mysql> show variab ...

  4. OpenGL chapter5 基础纹理

    Chapter5 基础纹理 Contents: ==================================================== | 任务 | 使用的函数 ========== ...

  5. ElasticSearch 在3节点集群的启动

    ElasticSearch的启动分前台和后台启动 先介绍前台启动: 先在master节点上启动 可以看到已经启动了 同时在slave1.slave2节点上也启动 可以看到都已经启动了! 在浏览器分别打 ...

  6. Glusterfs3.3.1DHT(hash分布)源代码分析

    https://my.oschina.net/uvwxyz/blog/182224 1.DHT简介 GlusterFS使用算法进行数据定位,集群中的任何服务器和客户端只需根据路径和文件名就可以对数据进 ...

  7. Python之函数——内置函数

    内置函数(Built-in Functions) 截止到3.6版本,python一共为我们提供了68个内置函数.它们就是python提供给的可以直接拿来使用的所有函数,接下来让我们一起认识一下这些函数 ...

  8. python入门-列表

    列表使用[]来标识 列表和PHP中的数组类似 包括使用和访问方式都是类似 可以用下标索引的方式直接访问 来几个例子,这样看起来才舒服 names = ['baker','pitty','david', ...

  9. mock单测

    mockMvc执行流程总结: 整个过程:1.mockMvc.perform执行一个请求:2.MockMvcRequestBuilders.get("/user/1")构造一个请求3 ...

  10. Spring 配置 web.xml (防止spring 内存溢出)

    <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" " ...