The Basics of Probability

  • Probability measures the amount of uncertainty of an event: a fact whose occurence is uncertain.
  • Sample space refers to the set of all possible events, denoted as .
  • Some properties:
    • Sum rule:
    • Union bound:
  • Conditional probability:. To emphasize that p(A) is unconditional, p(A) is called "marginal probability", and p(B, A) is called "joint probability", where p(A, B)=p(B|A) p(A) is called the "multiplication rule" or "factorization rule".
  • Total probability theorem: p(B) = p(B|A)p(A) + p(B|~A)p(~A)
  • Bayes' Theorem

    Bayes' Theorem can be regarded as a rule to update a prior probability p(A) into a posterior probability p(A|B), taking into account the amount/occurrence of evidence/event B.

  • Conditional independence: Two events A and B, with p(A)>0 and p(B)>0 are independent, given C, if p(A, B|C)=p(A|C) p(B|C).
  • Probability mass function (p.m.f) of random variable X is a function 
  • Joint probability mass function of X and Y is a function
  • Cumulative distribution function (c.d.f) of a random variable X is a function: 
  • The c.d.f describes the probability in a specific interval, whereas the p.m.f describes the probability in a specific event.
  • Expectation: the expectationof a random variable X is: 
    • linearity: E[aX+bY]=aE[x]+bE[Y]
    • if X and Y are independent: E[XY]=E[X]*E[Y]
    • Markov's inequality: let X be a nonnegative random variable with , then for all 
  • Variance: the variance of a random variable X is: , where is called the standard deviation of the random variable X.
    • Var[aX] = a2Var[X]
    • if X and Y are independent, Var[X+Y]=Var[X]+Var[Y]
    • Chebyshev's inequality: let X be a random variable , then for all 

Bernoulli Distribution

  • A (single) Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, "success" and "failure", or "yes" and "no". Examples of Bernoulli trials include: flipping a coin, political option poll, etc.
  • The Bernoulli distribution is a discrete probability distribution ofone (a) discrete random variable X, which takes value 1 with success probability p: Pr(X=1)=p, and value 0 with failure probability Pr(X=0)=q=1-p. For formally, the Bernoulli distribution is summarized as follows:
    • notation: Bern(p), where 0<p<1 is the probability of success.
    • support: X={0, 1}
    • p.m.f: Pr[X=0]=q=1-p, Pr[X=1]=p
    • mean: E[X]=p
    • variance: Var[X]=p(1-p)
    • It is a special case of Binomial distribution B(n, p). Bernoulli distribution is B(1, p).

Binomial Distribution

  • The Binomial distribution is the discrete probability distribution of the number of successes in a sequence ofn independent Bernoulli trials with success probabilityp, denoted asX~B(n, p).
  • The Binomial distribution is often used to model the number of successes in a sample of sizen drawn with replacement from a population of sizeN. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one.
  • The Binomial distribution is summarized as follows:
    • notation: B(n, p), where n is the number of trials and p is the success probability in each trial
    • support: k = {0, 1, ..., n} the number of successes
    • p.m.f:
    • mean: np
    • variance: np(1-p)
  • If n is large enough, then the skew of the distribution is not too great. In this case, a reasonable approximation to B(n, p) is given by the normal distribution: since a large n will result in difficulty to compute the p.m.f of Binomial distribution. 
    • one rule to determine if such approximation is reasonable, or if n is large enough is that both np and np(1-p) must be greater than 5. If both are greater than 15 then the approximation should be good.
    • A second rule is than for n>5, the normal approximation is adequate if:
    • Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviation of its mean is within the range of possible values, that is if:
    • To improve the accuracy of the approximation, we usually use a correction factor to take into account that the binomial random variable is discrete while the normal random variable is continuous. In particular, the basic idea is to treat the discrete value k as the continuous interval from k-0.5 to k+0.5.
  • In addition, Poisson distribution can be used to approximate the Binomial distribution when n is very large. A rule of thumb stating that the Poisson distribution is a good approximation oof the binomial distribution if n is at least 20 and p is smaller than or equal to 0.05, and an excellent approximation if n>=100, and np<=10:

Poisson Distribution

  • Poisson distribution: Let X be a discrete random variable taking values in the set of integer numbers  with probability:

    My understanding. Poisson distribution describes the fact that the probability of drawing a specific integer from a set of integers is not uniform. For example, it is well-known that if someone is asked to pick a random integer from 1-10, some integers are occurring with greater probability whereas some others happen with lower probability. Although it seems that all possible integers get equal chance to be picked, it is not true in real case. I think this may be due to subjectivity of people, i.e., some one prefers larger values while other tends to pick smaller ones. This point needs to be verified as I got this feeling totally from intuitions. 
  • The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independent of the time since the last event.
  • The Poisson distribution is summarized as follows.
    • notation: , where is a real number, indicating the number of events occurring that will be observed in the time interval.
    • support: k = {0, 1, 2, 3, ...}
    • mean: 
    • variance: 
  • Applications of Poisson distribution
    • Telecommunication: telephone calls arriving in a system
    • Management: customers arriving at a counter or call center
    • Civil engineering: cars arriving at a traffic light
  • Generating Poisson random variables
    algorithm poisson_random_number:
    init:
    Let

    , 

    ,  and 

    .
    do:

         Generate uniform random number u in [0, 1], and let 

    while p>L.
    return k-1.

References

  1. Paola Sebastiani, A tutorial on probability theory
  2. Mehryar Mohri, Introduction to Machine Learning - Basic Probability Notations.

Study notes for Discrete Probability Distribution的更多相关文章

  1. Generating a Random Sample from discrete probability distribution

    If is a discrete random variable taking on values , then we can write . Implementation of this formu ...

  2. Machine Learning Algorithms Study Notes(2)--Supervised Learning

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...

  3. Notes on the Dirichlet Distribution and Dirichlet Process

    Notes on the Dirichlet Distribution and Dirichlet Process In [3]: %matplotlib inline   Note: I wrote ...

  4. Study note for Continuous Probability Distributions

    Basics of Probability Probability density function (pdf). Let X be a continuous random variable. The ...

  5. Machine Learning Algorithms Study Notes(3)--Learning Theory

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...

  6. Machine Learning Algorithms Study Notes(1)--Introduction

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 目 录 1    Introduction    1 1.1    ...

  7. Study notes for Latent Dirichlet Allocation

    1. Topic Models Topic models are based upon the idea that documents are mixtures of topics, where a ...

  8. Study notes for Clustering and K-means

    1. Clustering Analysis Clustering is the process of grouping a set of (unlabeled) data objects into ...

  9. ORACLE STUDY NOTES 01

    [JSU]LJDragon's Oracle course notes In the first semester, junior year DML数据操纵语言 DML指:update,delete, ...

随机推荐

  1. 小言HTTP Authentication

    什么是Authentication? 首先解释两个长的非常像.easy混淆的单词,Authentication(鉴定.认证)和Authorization(授权). Authentication就是要证 ...

  2. C#验证IP地址

    using System.Net; try { IPAddress a = IPAddress.Parse(输入的IP字符串); } catch (System.Exception ex) { Mes ...

  3. Keil中使用Astyel进行C语言的格式化

    Astyel !E --style=linux --delete-empty-lines --indent=spaces=2 --break-blocks 这可以做到, 使用Linux风格的代码 ) ...

  4. 关于Java String对象创建的几点疑问

    我们通过JDK源码会知道String实质是字符数组,而且是不可被继承(final)和具有不可变性(immutable).可以如果想要了解String的创建我们需要先了解下JVM的内存结构. 1.JVM ...

  5. C语言字符串操作函数集

    1)字符串操作 strcpy(p, p1) 复制字符串 strncpy(p, p1, n) 复制指定长度字符串 strcat(p, p1) 附加字符串 strncat(p, p1, n) 附加指定长度 ...

  6. [译]Java设计模式之解释器

    (文章翻译自Java Design Pattern: Interpreter) 解释器模式适用于当一些内容需要翻译的时候.下面的例子是一个非常简单的解释器实现.它将字母"a"和&q ...

  7. CSS移动

    #hand { width: 170px; height: 236px; position: absolute; top: 178px; left: 390px; background: url('h ...

  8. 创业路(VC Pipeline),创业需要融资的阅读

    企业家们经常问我,您的投资渠道(投资流程)到底是怎么样的? 看看有多少项目,有多少人遇到,频度,终于选择哪些公司进行了投资. 这让我认为有必要提高VC投资通道的可见度.同一时候也有助于介绍到底哪些方面 ...

  9. Scala从零开始:使用Intellij IDEA写hello world

    Scala从零开始:使用Intellij IDEA写hello world 分类: Scala |2014-05-23 00:39 |860人阅读   引言 在之前的文章中,我们介绍了如何使用Scal ...

  10. 用javascript实现2048的小游戏

    前段时间,看了一个视频,用javascript实现的2048小游戏,发现不难,都是一些基出的语法和简单逻辑. 整个2048游戏没有很多的数据,所有,实现起来还是很有成就感的. 先上图,简直就和原版游戏 ...