How TermFinder calculates P-values

Readme: MGI GO Term Finder

The GoTermFinder attempts to determine whether an observed level of annotation for a group of genes is significant within the context of annotation for all genes within the genome.  Suppose that we have a total population of N genes, in which M have a particular annotation.   If we observe x genes with that annotation, in a sample of n genes, then we can calculate the probability of that observation, using the hypergeometric distribution (eg, see http://mathworld.wolfram.com/HypergeometricDistribution.html ) as:

P-value is the probability or chance of seeing at least x number of genes out of the total n genes in the list annotated to a particular GO term, given the proportion of genes in the whole genome that are annotated to that GO Term. That is, the GO terms shared by the genes in the user's list are compared to the background distribution of annotation. The closer the p-value is to zero, the more significant the particular GO term associated with the group of genes is (i.e. the less likely the observed annotation of the particular GO term to a group of genes occurs by chance).

In other words, when searching the process ontology, if all of the genes in a group were associated with "DNA repair", this term would be significant. However, since all genes in the genome (with GO annotations) are indirectly associated with the top level term "biological_process", this would not be significant if all the genes in a group were associated with this very high level term.

Multiple comparisons problem

Gene Ontology analysis in multiple gene clusters under multiple hypothesis testing frameworks.

Hypergeometric distribution的更多相关文章

  1. Study notes for Discrete Probability Distribution

    The Basics of Probability Probability measures the amount of uncertainty of an event: a fact whose o ...

  2. 常见的概率分布类型(二)(Probability Distribution II)

    以下是几种常见的离散型概率分布和连续型概率分布类型: 伯努利分布(Bernoulli Distribution):常称为0-1分布,即它的随机变量只取值0或者1. 伯努利试验是单次随机试验,只有&qu ...

  3. NLP&数据挖掘基础知识

    Basis(基础): SSE(Sum of Squared Error, 平方误差和) SAE(Sum of Absolute Error, 绝对误差和) SRE(Sum of Relative Er ...

  4. 《量化投资:以MATLAB为工具》连载(2)基础篇-N分钟学会MATLAB(中)

    http://www.matlabsky.com/thread-43937-1-1.html   <量化投资:以MATLAB为工具>连载(3)基础篇-N分钟学会MATLAB(下)     ...

  5. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Final

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  6. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Midterm

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  7. 加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 2 Random sampling with and without replacement

    Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  8. 加州大学伯克利分校Stat2.3x Inference 统计推断学习笔记: Section 2 Testing Statistical Hypotheses

    Stat2.3x Inference(统计推断)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授. PDF笔记下载(Acad ...

  9. R代码展示各种统计学分布 | 生物信息学举例

    二项分布 | Binomial distribution 泊松分布 | Poisson Distribution 正态分布 | Normal Distribution | Gaussian distr ...

随机推荐

  1. tp5 本地安装和调试的问题

    安装的时候用官方下载的包或者用composer指定版本号,不要用git,会安装最新的包. 本地配置域名的时候出错,要不就是500要不就是找不到文件,原因是目录路径里的反斜杆加字母t被转义了,改成正斜杠 ...

  2. hdu 4366 Successor - CDQ分治 - 线段树 - 树分块

    Sean owns a company and he is the BOSS.The other Staff has one Superior.every staff has a loyalty an ...

  3. 16 级高代 II 思考题九的七种解法

    16 级高代 II 思考题九  设 $V$ 是数域 $\mathbb{K}$ 上的 $n$ 维线性空间, $\varphi$ 是 $V$ 上的线性变换, $f(\lambda),m(\lambda)$ ...

  4. Python3基础 list del 从内存中删除整个列表

             Python : 3.7.0          OS : Ubuntu 18.04.1 LTS         IDE : PyCharm 2018.2.4       Conda ...

  5. Restful framework【第四篇】视图组件

    基本使用 -view的封装过程有空可以看一下 -ViewSetMixin # ViewSetMixin 写在前面,先找ViewSetMixin的as_view方法 # 用了ViewSetMixin , ...

  6. Android O HIDL的实现对接【转】

    本文转载自:https://blog.csdn.net/gh201030460222/article/details/80551897 Android O HIDL的实现对接1. HIDL的定义1.1 ...

  7. 如何在gvim中安装autoproto自动显示函数原型

    cankao: http://www.vim.org/scripts/script.php?script_id=1553 注意, 在gvim中执行的命令, :foo和:!foo 的区别, 跟vim一样 ...

  8. 安装 Linux 内核 4.0

    大家好,今天我们学习一下如何从Elrepo或者源代码来安装最新的Linux内核4.0.代号为‘Hurr durr I'm a sheep’的Linux内核4.0是目前为止最新的主干内核.它是稳定版3. ...

  9. Latex: 添加IEEE会议论文作者信息

    参考: Multiple Authors with common affiliations in IEEEtran conference template Latex: 添加IEEE会议论文作者信息 ...

  10. git Bush应用崩溃If no other git process is currently running, this probably means a git process crashed

    问题: 用git Bush提交的时候遇到一个问题,不论做什么操作都遇到下面的错误信息: fatal: Unable to create 'XXXXXXXXX' : File exists. If no ...