Chip-seq peak annontation



Chip-seq peak annontation

PeRl

narrowPeak/boardPeak

narrowPeak/boardPeak 是ENCODE可提供下载的两种 Chip-seq 经过参考人类基因组mapping后的关于peak的数据.
其他类型的seq数据储存个数可以参看FAQformat

narrowPeak

数据按照以下规则储存:

1. string chrom: "Reference sequence chromosome or scaffold"
2. uint chromStart: "Start position in chromosome"
3. uint chromEnd: "End position in chromosome"
4. string name: "Name given to a region (preferably unique). Use . if no name is assigned"
5. uint score: "Indicates how dark the peak will be displayed in the browser (0-1000) "
6. char[1] strand: "+ or - or . for unknown"
7. float signalValue: "Measurement of average enrichment for the region"
8. float pValue: "Statistical significance of signal value (-log10). Set to -1 if not used."
9. float qValue: "Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if n-ot used."
10. int peak: "Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-sour-ce called."

boardPeak

数据按照以下规则储存:

1. string chrom: "Reference sequence chromosome or scaffold"
2. uint chromStart: "Start position in chromosome"
3. uint chromEnd: "End position in chromosome"
4. string name: "Name given to a region (preferably unique). Use . if no name is assigned"
5. uint score: "Indicates how dark the peak will be displayed in the browser (0-1000) "
6. char[1] strand: "+ or - or . for unknown"
7. float signalValue: "Measurement of average enrichment for the region"
8. float pValue: "Statistical significance of signal value (-log10). Set to -1 if<BR> not used."
9. float qValue: "Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if n-ot used."

在接下去的peak annotation中,只演示narrowPeak.bed格式数据.

示例数据

演示数据来自ENCODEH3K9me3 的Chip-seq,样本ID为ENCFF199BLM.

样本的基本信息:

  • Homo sapiens liver male adult (32 years)

    • Target: H3K9me3
    • Lab: Bing Ren, UCSD
    • Project: Roadmap

narrowPeak 数据基本信息:

##   seqnames     start       end       name score strand signalValue  pValue
## 1 chr10 100134639 100134831 Peak_7974 178 . 4.41261 6.68330
## 2 chr10 100446376 100446664 Peak_4189 244 . 5.13248 8.41215
## 3 chr10 100779568 100779699 Peak_32369 101 . 3.34436 4.50936
## 4 chr10 10088147 10088346 Peak_28509 112 . 4.05830 4.84890
## 5 chr10 101149252 101149594 Peak_6146 211 . 4.89252 7.53305
## 6 chr10 101173156 101173424 Peak_32985 94 . 3.45278 4.33257
## qValue peak
## 1 2.96490 132
## 2 4.37217 165
## 3 1.47374 105
## 4 1.69566 90
## 5 3.67439 174
## 6 1.30993 237

构建注释文件

在peak annotation中需要一个用于参考的注释文件,需要包括一下信息: 1. 染色体名 2. 转录本起始位点 3. 转录本终止位点 4. gene的名字 5. 正反链

注释文件可以直接从外部导入,也可以利用biomaRt 生成

library(ChIPpeakAnno)
library(biomaRt)
mart <- useMart("ensembl")
datasets <- listDatasets(mart)
mart <- useDataset("hsapiens_gene_ensembl",mart)
# 需要筛选的特征
props <- c("ensembl_gene_id", "external_gene_name", "transcript_biotype", "chromosome_name", "start_position", "end_position", "strand") # 筛选染色体号
lincRNA <- subset(
getBM(attributes=props, mart=mart, filters = "chromosome_name", values = c(1:22,"X","Y")),
transcript_biotype == "lincRNA"
)
# 在染色体前加"chr"保持和narrowPeak数据一致
lincRNA[,4] <- paste0("chr", lincRNA[,4])

得到的数据包含以下信息:

##   ensembl_gene_id external_gene_name transcript_biotype chromosome_name
## 1 ENSG00000276255 RP5-881P19.7 lincRNA chr1
## 2 ENSG00000234277 LINC01641 lincRNA chr1
## 3 ENSG00000238107 RP11-495P10.5 lincRNA chr1
## 4 ENSG00000274020 LINC01138 lincRNA chr1
## 5 ENSG00000225620 RP11-569A11.2 lincRNA chr1
## 6 ENSG00000237520 RP11-443B7.2 lincRNA chr1
## start_position end_position strand
## 1 228073909 228076550 -1
## 2 227393591 227431035 1
## 3 148295180 148297556 1
## 4 148290889 148519604 -1
## 5 202632428 202632911 1
## 6 234957231 234959989 1

进行peak annotation

有了这个注释文件以后,我们就可以根据我们想要的筛选规则对peak进行注释,主要用到的包是 ChIPpeakAnno.用到的函数为annotatePeakInBatch.

# 将前面得到的注释文件转换为RangedData对象
library(ChIPpeakAnno)
myCustomAnno <- RangedData(
IRanges(
start=lincRNA[,"start_position"],
end=lincRNA[,"end_position"],
names=lincRNA[,"ensembl_gene_id"]),
space=lincRNA[,"chromosome_name"],
strand=lincRNA[,"strand"])
# 读入需要注释的narrowPeak数据
bed_file <- read.table("ENCFF199BLM.bed", header = T, sep = "\t", stringsAsFactors = F)
# 将peak数据转换为GRanges
peaks <- toGRanges(bed_file, format="narrowPeak", colNames = colnames(bed_file))
# 根据需要进行筛选
anno <- annotatePeakInBatch(peaks, AnnotationData=myCustomAnno,
output="overlapping",
FeatureLocForDistance="TSS",
bindingRegion=c(-2000, 2000))

anno中提取我们需要的lincRNA的ID

result_lincRNA <- anno@elementMetadata$feature
head(result_lincRNA)
## [1] "ENSG00000237579" "ENSG00000235180" "ENSG00000232259" "ENSG00000204365"
## [5] "ENSG00000226578" "ENSG00000260137"

Chip-seq peak annontation的更多相关文章

  1. 测序深度和覆盖度(Sequencing depth and coverage)

    总是跑数据,却对数据一无所知,这说不过去吧. 看几篇文章吧 Sequencing depth and coverage: key considerations in genomic analyses( ...

  2. getopt两个模块getopt 和gun_getopt 的异同

    getopt的两个模块getopt和gun_getopt都可以接收参数,但是又有不同; 先看 getopt.getopt这个模块: import sys import getopt def main( ...

  3. ChIP-seq技术介绍|易基因

    大家好,这里是专注表观组学十余年,多组学科研服务领跑者的易基因. 染色质免疫沉淀后测序(ChIP seq)是一种针对DNA结合蛋白.组蛋白修饰或核小体的全基因组分析技术.由于二代测序技术的巨大进步,C ...

  4. [LeetCode] Find Peak Element 求数组的局部峰值

    A peak element is an element that is greater than its neighbors. Given an input array where num[i] ≠ ...

  5. LeetCode 162 Find Peak Element

    Problem: A peak element is an element that is greater than its neighbors. Given an input array where ...

  6. Find Peak Element

    A peak element is an element that is greater than its neighbors. Given an input array where num[i] ≠ ...

  7. [LintCode] Find Peak Element 求数组的峰值

    There is an integer array which has the following features: The numbers in adjacent positions are di ...

  8. BZOJ1798: [Ahoi2009]Seq 维护序列seq[线段树]

    1798: [Ahoi2009]Seq 维护序列seq Time Limit: 30 Sec  Memory Limit: 64 MBSubmit: 5504  Solved: 1937[Submit ...

  9. a chip multiprocessor

    COMPUTER OR GANIZATION AND ARCHITECTURE DESIGNING FOR PERFORMANCE NINTH EDITION A multicore computer ...

随机推荐

  1. Python学习---重点模块之shelve

    简单示例 import shelve f = shelve.open(r'shelve.txt') f['info'] = {'name':'ftl', 'age':23, 'sex': 'male' ...

  2. Nginx配置虚拟机,url重写,防盗链

    配置目录: ·     虚拟主机 ·     PHP支持 ·     URL重写 ·     防止盗链 ·     持续更新… 一.虚拟主机 1.创建 文件格式:{域名}.conf 具体如下: $ s ...

  3. .NET控件命名规范

    一.基本数据类型前缀 数据类型    数据类型简写 Array    arr Boolean    bln Byte    byt Char    chr DateTime    dtm Decima ...

  4. Future Research Directions in Social Recommendation

    From the tutorial published by Martin Ester in RecSys 2013 Future Research Directions --Recommendati ...

  5. Disruptor

    高性能队列Disruptor系列2--浅析Disruptor   目录 1. Disruptor简单介绍2. 为什么Disruptor如此之快3. Disruptor结构分析 1. Disruptor ...

  6. Android webview 点击超链接打开新的webview

    webview.setWebViewClient(new webViewClient() { HitTestResult hit = view.getHitTestResult(); if (hit ...

  7. MySQL错误问题

    启动Tomcat的时候报错:no suitable driver,MySql更新使用com.mysql.cj.jdbc.Driver,废弃老的com.mysql.jdbc.Driver驱动,需要将D: ...

  8. 2019.1.6 2.8 Spring的AOP事务

    2.8 Spring的AOP事务 xml配置aop事务 先applicationContext.xml 文件 配置事务管管理器 配置通知 织入目标对象

  9. vue+elementUI封装的时间插件(有起始时间不能大于结束时间的验证)

    vue+elementUI封装的时间插件(有起始时间不能大于结束时间的验证): html: <el-form-item label="活动时间" required> & ...

  10. 二. Python WebDriver环境搭建

    1. 安装Selenium 在命令行中输入: 显示安装成功: 2. 测试例子 打开百度页面并在输入框输入搜索内容(默认为firework) # 1. Selenium默认为Firefox.验证 fro ...