DEXSeq
1)Introduction
DEXSeq是一种在多个比较RNA-seq实验中,检验差异外显子使用情况的方法。 通过差异外显子使用(DEU),我们指的是由实验条件引起的外显子相对使用的变化。 外显子的相对使用定义为:
number of transcripts from the gene that contain this exon / number of all transcripts from the gene
大致思想:. For each exon (or part of an exon) and each sample, we count how many reads map to this exon and how many reads map to any of the other exons of the same gene. We consider the ratio of these two counts, and how it changes across conditions, to infer changes in the relative exon usage
2)安装
if("DEXSeq" %in% rownames(installed.packages()) == FALSE) {source("http://bioconductor.org/biocLite.R");biocLite("DEXSeq")}
suppressMessages(library(DEXSeq))
ls('package:DEXSeq')
pythonScriptsDir = system.file( "python_scripts", package="DEXSeq" )
list.files(pythonScriptsDir)
## [1] "dexseq_count.py" "dexseq_prepare_annotation.py" #查看是否含有这两个脚本
python dexseq_prepare_annotation.py Drosophila_melanogaster.BDGP5.72.gtf Dmel.BDGP5.25.62.DEXSeq.chr.gff #GTF转化为GFF with collapsed exon counting bins.
python dexseq_count.py Dmel.BDGP5.25.62.DEXSeq.chr.gff untreated1.sam untreated1fb.txt #count
3) 用自带实验数据集(数据预处理)
suppressMessages(library(pasilla))
inDir = system.file("extdata", package="pasilla")
countFiles = list.files(inDir, pattern="fb.txt$", full.names=TRUE) #countfile(如果不是自带数据集,可以由dexseq_count.py脚本生成)
basename(countFiles)
flattenedFile = list.files(inDir, pattern="gff$", full.names=TRUE)
basename(flattenedFile) #gff文件(如果不是自带数据集,可以由dexseq_prepare_annotation.py脚本生成)
########构造数据框sampleTable,包含sample名字,实验,文库类型等信息#######################
sampleTable = data.frame(
row.names = c( "treated1", "treated2", "treated3",
"untreated1", "untreated2", "untreated3", "untreated4" ),
condition = c("knockdown", "knockdown", "knockdown",
"control", "control", "control", "control" ),
libType = c( "single-end", "paired-end", "paired-end",
"single-end", "single-end", "paired-end", "paired-end" ) )
sampleTable ##############构建 DEXSeqDataSet object#############################
dxd = DEXSeqDataSetFromHTSeq(
countFiles,
sampleData=sampleTable,
design= ~ sample + exon + condition:exon,
flattenedfile=flattenedFile ) #四个参数
4)Standard analysis work-flow
########以下是简单的实验设计#####
genesForSubset = read.table(file.path(inDir, "geneIDsinsubset.txt"),stringsAsFactors=FALSE)[[1]] #基因子集ID
dxd = dxd[geneIDs( dxd ) %in% genesForSubset,] #取子集,减少运行量
head(colData(dxd))
head( counts(dxd), 5 )
split( seq_len(ncol(dxd)), colData(dxd)$exon )
sampleAnnotation( dxd )
############# dispersion estimates and the size factors#############
dxd = estimateSizeFactors( dxd ) ##Normalisation
dxd = estimateDispersions( dxd )
plotDispEsts( dxd ) #图1 #################Testing for differential exon usage############
dxd = testForDEU( dxd )
dxd = estimateExonFoldChanges( dxd, fitExpToVar="condition")
dxr1 = DEXSeqResults( dxd )
dxr1
mcols(dxr1)$description
table ( dxr1$padj < 0.1 )
table ( tapply( dxr1$padj < 0.1, dxr1$groupID, any ) )
plotMA( dxr1, cex=0.8 ) #图2

To see how the power to detect differential exon usage depends on the number of reads that map to an exon, a so-called MA plot is useful, which plots the logarithm of fold change versus average normalized count per exon and marks by red colour the exons which are considered significant; here, the exons with an adjusted p values of less than 0.1

############以下是更复杂的实验设计##################
formulaFullModel = ~ sample + exon + libType:exon + condition:exon
formulaReducedModel = ~ sample + exon + libType:exon
dxd = estimateDispersions( dxd, formula = formulaFullModel )
dxd = testForDEU( dxd,
reducedModel = formulaReducedModel,
fullModel = formulaFullModel )
dxr2 = DEXSeqResults( dxd )
table( dxr2$padj < 0.1 )
table( before = dxr1$padj < 0.1, now = dxr2$padj < 0.1 )##和简单的实验设计比较
5)Visualization
plotDEXSeq( dxr2, "FBgn0010909", legend=TRUE, cex.axis=1.2, cex=1.3,
lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", displayTranscripts=TRUE, legend=TRUE,
cex.axis=1.2, cex=1.3, lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", expression=FALSE, norCounts=TRUE,
legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
plotDEXSeq( dxr2, "FBgn0010909", expression=FALSE, splicing=TRUE,
legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
DEXSeqHTML( dxr2, FDR=0.1, color=c("#FF000080", "#0000FF80") )




DEXSeq的更多相关文章
- 【转录组入门】6:reads计数
作业要求: 实现这个功能的软件也很多,还是烦请大家先自己搜索几个教程,入门请统一用htseq-count,对每个样本都会输出一个表达量文件. 需要用脚本合并所有的样本为表达矩阵.参考:生信编程直播第四 ...
- Bulk RNA-Seq转录组学习
与之对应的是single cell RNA-Seq,后面也会有类似文章. 参考:https://github.com/xuzhougeng/Learn-Bioinformatics/ 作业:RNA-s ...
- Bioconductor应用领域之基因芯片
引用自https://mp.weixin.qq.com/s?__biz=MzU4NjU4ODQ2MQ==&mid=2247484662&idx=1&sn=194668553f9 ...
随机推荐
- SQL Server的通用分页存储过程 未使用游标,速度更快!
经过一个下午的时间,和我一个同事(绝对是高手)的共同努力下,摸索出了以下的思路: 1.确定存储的输入参数: 1)SQL脚本,该参数接收完整的.正确的SQL检索文本,可将原应用中写好的SQL脚本直接传入 ...
- java 复制对象 (克隆接口 与 序列化)
关于java对象复制我们在编码过程经常会碰到将一个对象传递给另一个对象,java中对于基本型变量采用的是值传递,而对于对象比如bean传递时采用的是应用传递也就是地址传递,而很多时候对于对象传递我们也 ...
- BASIC-17_蓝桥杯_矩阵乘法
示例代码: #include <stdio.h>#define N 30 int main(void){ int n = 0 , m = 0 , sum = 0; int i = 0 , ...
- 网络性能测试工具iperf详细使用图文教程(转)
Iperf是一个网络性能测试工具.Iperf可以测试TCP和UDP带宽质量.Iperf可以测量最大TCP带宽,具有多种参数和UDP特性.Iperf可以报告带宽,延迟抖动和数据包丢失.利用Iperf这一 ...
- Rest架构以及什么是Restful
关于Rest的内容,在网上开了好多文章~ 下面我就把一些关于Rest经典的链接发出来,大家可以参考一下~ 1.什么是Rest和Restful? 怎样用通俗的语言解释什么叫 REST,以及什么是 RES ...
- java单机操作redis3.2.10和集群操作增删改查
先直接附上单机版的连接和增删改查,7000-7005是端口号 package com.yilian.util; import java.util.HashMap; import java.util.I ...
- 利用新浪js接口根据ip地址获取实际地址
1.核心:http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=json&ip=192.152.3.25 把这句话直接输入到浏览器 ...
- Ubuntu14.04配置jdk1.8.0_25,可切换版本
下载jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html 解压: sudo m ...
- ORM 框架简介
对象-关系映射(Object/Relation Mapping,简称ORM),是随着面向对象的软件开发方法发展而产生的.面向对象的开发方法是当今企业级应用开发环境中的主流开发方法,关系数据库是企业级应 ...
- UVA133
减少领取救济金排队的长度是一个严重的问题,The New National Green Labour RhinocerosParty (这个党派)依据如下规则.每天来领取救济金的人排成一个大圆环.任选 ...