ensembl/release91:

cat Homo_sapiens.GRCh38.91.gtf | grep -v "#" | cut -f9 | cut -f1,3,6,8 -d\; | grep gene_biotype | sed -e 's/\"//g' | sed -e 's/\;//g' | cut -f2,6,8 -d" " | sort | uniq > GRCh38.feature.info
ENSG00000000003 TSPAN6 protein_coding
ENSG00000000005 TNMD protein_coding
ENSG00000000419 DPM1 protein_coding
ENSG00000000457 SCYL3 protein_coding
ENSG00000000460 C1orf112 protein_coding
ENSG00000000938 FGR protein_coding
ENSG00000000971 CFH protein_coding
ENSG00000001036 FUCA2 protein_coding
ENSG00000001084 GCLC protein_coding
ENSG00000001167 NFYA protein_coding
ENSG00000001460 STPG1 protein_coding
ENSG00000001461 NIPAL3 protein_coding
ENSG00000001497 LAS1L protein_coding
ENSG00000001561 ENPP4 protein_coding
ENSG00000001617 SEMA3F protein_coding
ENSG00000001626 CFTR protein_coding
ENSG00000001629 ANKIB1 protein_coding
ENSG00000001630 CYP51A1 protein_coding
ENSG00000001631 KRIT1 protein_coding
ENSG00000002016 RAD52 protein_coding
ENSG00000002079 MYH16 transcribed_unitary_pseudogene
ENSG00000002330 BAD protein_coding
ENSG00000002549 LAP3 protein_coding
ENSG00000002586 CD99 protein_coding
ENSG00000002587 HS3ST1 protein_coding
ENSG00000002726 AOC1 protein_coding
ENSG00000002745 WNT16 protein_coding
ENSG00000002746 HECW1 protein_coding
ENSG00000002822 MAD1L1 protein_coding
ENSG00000002834 LASP1 protein_coding
ENSG00000002919 SNX11 protein_coding
ENSG00000002933 TMEM176A protein_coding
ENSG00000003056 M6PR protein_coding
ENSG00000003096 KLHL13 protein_coding
ENSG00000003137 CYP26B1 protein_coding

  

58302个ENSG id

56655个gene name(为什么有将近两千个是重复)

46种类型:

31 3prime_overlapping_ncRNA
5517 antisense_RNA
19 bidirectional_promoter_lncRNA
14 IG_C_gene
9 IG_C_pseudogene
37 IG_D_gene
18 IG_J_gene
3 IG_J_pseudogene
1 IG_pseudogene
144 IG_V_gene
188 IG_V_pseudogene
7493 lincRNA
1 macro_lncRNA
1879 miRNA
2221 misc_RNA
2 Mt_rRNA
22 Mt_tRNA
3 non_coding
63 polymorphic_pseudogene
10235 processed_pseudogene
543 processed_transcript
19847 protein_coding
22 pseudogene
8 ribozyme
549 rRNA
49 scaRNA
1 scRNA
904 sense_intronic
189 sense_overlapping
943 snoRNA
1909 snRNA
5 sRNA
1066 TEC
462 transcribed_processed_pseudogene
111 transcribed_unitary_pseudogene
828 transcribed_unprocessed_pseudogene
2 translated_processed_pseudogene
6 TR_C_gene
4 TR_D_gene
79 TR_J_gene
4 TR_J_pseudogene
108 TR_V_gene
30 TR_V_pseudogene
95 unitary_pseudogene
2637 unprocessed_pseudogene
1 vaultRNA

GENCODE的注释gtf文件:  

##description: evidence-based annotation of the human genome (GRCh37), version 19 (Ensembl 74)
##provider: GENCODE
##contact: gencode@sanger.ac.uk
##format: gtf
##date: 2013-12-05
chr1 HAVANA gene 11869 14412 . + . gene_id "ENSG00000223972.4"; transcript_id "ENSG00000223972.4"; gene_type "pseudogene"; gene_status "KNOWN";gene_name "DDX11L1"; transcript_type "pseudogene"; transcript_status "KNOWN"; transcript_name "DDX11L1"; level 2; havana_gene "OTTHUMG00000000961.2";
chr1 HAVANA transcript 11869 14409 . + . gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN"; gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 11869 12227 . + . gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN";gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 1; exon_id "ENSE00002234944.1"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 12613 12721 . + . gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN";gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 2; exon_id "ENSE00003582793.1"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";
chr1 HAVANA exon 13221 14409 . + . gene_id "ENSG00000223972.4"; transcript_id "ENST00000456328.2"; gene_type "pseudogene"; gene_status "KNOWN";gene_name "DDX11L1"; transcript_type "processed_transcript"; transcript_status "KNOWN"; transcript_name "DDX11L1-002"; exon_number 3; exon_id "ENSE00002312635.1"; level 2; tag "basic"; havana_gene "OTTHUMG00000000961.2"; havana_transcript "OTTHUMT00000362751.1";

  

ensembl

#!genome-build GRCh38.p10
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.25
#!genebuild-last-updated 2017-06
1 havana gene 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
1 havana transcript 11869 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype"processed_transcript"; tag "basic"; transcript_support_level "1";
1 havana exon 11869 12227 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "1"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002234944"; exon_version "1"; tag "basic"; transcript_support_level "1";
1 havana exon 12613 12721 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00003582793"; exon_version "1"; tag "basic"; transcript_support_level "1";
1 havana exon 13221 14409 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; exon_number "3"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "processed_transcript"; exon_id "ENSE00002312635"; exon_version "1"; tag "basic"; transcript_support_level "1";
1 havana transcript 12010 13670 . + . gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000450305"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-201"; transcript_source "havana"; transcript_biotype"transcribed_unprocessed_pseudogene"; tag "basic"; transcript_support_level "NA";

  

  

  

问题:

1. 为什么用gencode的注释文件做表达定量会出问题?

2. 不同的release之间有什么区别?

3. 不同来源的注释区别在哪里?

4.

GRCh38基因组和注释文件探究的更多相关文章

  1. 关于注释【code templates】,如何导入本地注释文件

    关于如何在eclipse.myeclipse导入本地注释文件 [xxx.xml]   请看操作方式 下面是code templates文件的内容 注意  把文件中的 @@@@@@@@@@@@@@@  ...

  2. 关于基因组注释文件GTF的解释

    GTF文件的全称是gene transfer format,主要是对染色体上的基因进行标注.怎么理解呢,其实所谓的基因名,基因座等,都只是后来人们给一段DNA序列起的名字而已,还原到细胞中就是细胞核里 ...

  3. 【转】ubuntu中没有/etc/inittab文件探究

    原文网址:http://blog.csdn.net/gavinr/article/details/6584582 linux 启动时第一个进程是/sbin/init,其主要功能就是软件执行环境,包括系 ...

  4. 如何用cufflinks 拼出一个理想的注释文件

    后记: cufflinks安装: 下载安装包, 不要下载source code ,直接下载binary.    Source code    Linux x86_64 binary http://cu ...

  5. HTTP上传文件探究

    通常情况下,我们想在网页上上传一个文件的时候,会采用<input type="file">标签,但是你有没有想过,为什么通过这样一个标签,服务器端就能获取到文件数据呢? ...

  6. ubuntu12.10中没有/etc/inittab文件探究

    1. 我们首先来看一下Linux系统开机启动过程: Ubuntu是Linux系统的衍生系统,其开机启动过程与上图相差不大,但是随着系统的不断发展,终究还是有不同的地方,下面,我们来了解一下Ubuntu ...

  7. 初步了解hg19注释文件的内容 | gtf

    hg19有哪些染色体? chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 c ...

  8. idea 注释文件和方法注释

    类注释: 如下图所示

  9. [转] c++加载外部库文件探究

    首先介绍:用#import导入dll和用#pragma comment导入lib还有在程序中LoadLibrary加载dll有什么区别 (1) #import导入的dll是com组建的dll,主要用来 ...

随机推荐

  1. MS11-050安全漏洞

    IE浏览器渗透攻击--MS11050安全漏洞 实验前准备 1.两台虚拟机,其中一台为kali,一台为windows xp sp3(包含IE7). 2.设置虚拟机网络为NAT模式,保证两台虚拟机可以相互 ...

  2. 一、变量.二、过滤器(filter).三、标签(tag).四、条件分支tag.五、迭代器tag.六、自定义过滤器与标签.七、全系统过滤器(了解)

    一.变量 ''' 1.视图函数可以通过两种方式将变量传递给模板页面 -- render(request, 'test_page.html', {'变量key1': '变量值1', ..., '变量ke ...

  3. 【Python54.1--豆瓣登录】

    1.模拟豆瓣登录 ''' |-- 代码解析: |-- 1.登录必须具备的条件:url,cookie,fromData fromData的参数如下: source: index_nav form_ema ...

  4. python&django 常见问题及解决方法

    0.python-dev安装(ubuntu) apt-get install  python-dev 1.Open(filename,mode) 报错实例: f = open('d:\Users\16 ...

  5. 编译openwrt时报错:FMCGenericError.h:34:27: fatal error: libxml/parser.h: No such file or directory

    解决办法: 更新openwrt的feeds,并重新make menuconfig ./script/feeds update -a ./script/feeds install -a

  6. linux任务计划及周期性任务计划

    相关命令:at.batch.cron.mailx 未来某时间执行一次任务:at, batch 周期性运行某任务: cron 一.未来某时间执行一次任务:at命令 at, batch, atq, atr ...

  7. 【开机自启】Linux下设置MySql自动启动

    1.将服务文件拷贝到init.d下,并重命名为mysql cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld 2.赋予可 ...

  8. (转)Introductory guide to Generative Adversarial Networks (GANs) and their promise!

    Introductory guide to Generative Adversarial Networks (GANs) and their promise! Introduction Neural ...

  9. 安装 Linux 内核 4.0

    大家好,今天我们学习一下如何从Elrepo或者源代码来安装最新的Linux内核4.0.代号为‘Hurr durr I'm a sheep’的Linux内核4.0是目前为止最新的主干内核.它是稳定版3. ...

  10. Vue学习二:v-model指令使用方法

    本文为博主原创,未经允许不得转载: <!DOCTYPE html> <html lang="zh"> <head> <script src ...