Notes on the Dirichlet Distribution and Dirichlet Process
Notes on the Dirichlet Distribution and Dirichlet Process
%matplotlib inline
Note: I wrote this post in an IPython notebook. It might be rendered better on NBViewer.
Dirichlet Distribution
The symmetric Dirichlet distribution (DD) can be considered a distribution of distributions. Each sample from the DD is acategorial distribution over K categories. It is parameterized G0, a distribution over K categories and α, a scale factor.
The expected value of the DD is G0. The variance of the DD is a function of the scale factor. When α is large, samples from DD(α⋅G0) will be very close to G0. When α is small, samples will vary more widely.
We demonstrate below by setting G0=[.2,.2,.6] and varying α from 0.1 to 1000. In each case, the mean of the samples is roughly G0, but the standard deviation is decreases as α increases.
import numpy as np
from scipy.stats import dirichlet
np.set_printoptions(precision=2) def stats(scale_factor, G0=[.2, .2, .6], N=10000):
samples = dirichlet(alpha = scale_factor * np.array(G0)).rvs(N)
print " alpha:", scale_factor
print " element-wise mean:", samples.mean(axis=0)
print "element-wise standard deviation:", samples.std(axis=0)
print for scale in [0.1, 1, 10, 100, 1000]:
stats(scale)
alpha: 0.1
element-wise mean: [ 0.2 0.2 0.6]
element-wise standard deviation: [ 0.38 0.38 0.47] alpha: 1
element-wise mean: [ 0.2 0.2 0.6]
element-wise standard deviation: [ 0.28 0.28 0.35] alpha: 10
element-wise mean: [ 0.2 0.2 0.6]
element-wise standard deviation: [ 0.12 0.12 0.15] alpha: 100
element-wise mean: [ 0.2 0.2 0.6]
element-wise standard deviation: [ 0.04 0.04 0.05] alpha: 1000
element-wise mean: [ 0.2 0.2 0.6]
element-wise standard deviation: [ 0.01 0.01 0.02]
Dirichlet Process
The Dirichlet Process can be considered a way to generalizethe Dirichlet distribution. While the Dirichlet distribution is parameterized by a discrete distribution G0 and generates samples that are similar discrete distributions, the Dirichlet process is parameterized by a generic distribution H0 and generates samples which are distributions similar to H0. The Dirichlet process also has a parameter α that determines how similar how widely samples will vary from H0.
We can construct a sample H (recall that H is a probability distribution) from a Dirichlet process DP(αH0) by drawing a countably infinite number of samples θk from H0) and setting:
where the πk are carefully chosen weights (more later) that sum to 1. (δ is the Dirac delta function.)
H, a sample from DP(αH0), is a probability distributionthat looks similar to H0 (also a distribution). In particular, His a discrete distribution that takes the value θk with probability πk. This sampled distribution H is a discrete distribution even if H0 has continuous support; the support ofH is a countably infinite subset of the support H0.
The weights (pk values) of a Dirichlet process sample related the Dirichlet process back to the Dirichlet distribution.
Gregor Heinrich writes:
The defining property of the DP is that its samples have weights πk and locations θk distributed in such a way that when partitioning S(H) into finitely many arbitrary disjoint subsets S1,…,Sj J<∞, the sums of the weights πk in each of these J subsets are distributed according to a Dirichlet distribution that is parameterized by α and a discrete base distribution (likeG0) whose weights are equal to the integrals of the base distribution H0 over the subsets Sn.
As an example, Heinrich imagines a DP with a standard normal base measure H0∼N(0,1). Let H be a sample fromDP(H) and partition the real line (the support of a normal distribution) as S1=(−∞,−1], S2=(−1,1], and S3=(1,∞] then
where H(Sn) be the sum of the πk values whose θk lie in Sn.
These Sn subsets are chosen for convenience, however similar results would hold for any choice of Sn. For any sample from a Dirichlet process, we can construct a sample from a Dirichletdistribution by partitioning the support of the sample into a finite number of bins.
There are several equivalent ways to choose the πk so that this property is satisfied: the Chinese restaurant process, the stick-breaking process, and the Pólya urn scheme.
To generate {πk} according to a stick-breaking process we define βk to be a sample from Beta(1,α). π1 is equal to β1. Successive values are defined recursively as
Thus, if we want to draw a sample from a Dirichlet distribution, we could, in theory, sample an infinite number of θk values from the base distribution H0, an infinite number of βk values from the Beta distribution. Of course, sampling an infinite number of values is easier in theory than in practice.
However, by noting that the πk values are positive values summing to 1, we note that, in expectation, they must get increasingly small as k→∞. Thus, we can reasonably approximate a sample H∼DP(αH0) by drawing enoughsamples such that ∑Kk=1πk≈1.
We use this method below to draw approximate samples from several Dirichlet processes with a standard normal (N(0,1)) base distribution but varying α values.
Recall that a single sample from a Dirichlet process is a probability distribution over a countably infinite subset of the support of the base measure.
The blue line is the PDF for a standard normal. The black lines represent the θk and πk values; θk is indicated by the position of the black line on the x-axis; πk is proportional to the height of each line.
We generate enough πk values are generated so their sum is greater than 0.99. When α is small, very few θk's will have corresponding πk values larger than 0.01. However, as αgrows large, the sample becomes a more accurate (though still discrete) approximation of N(0,1).
import matplotlib.pyplot as plt
from scipy.stats import beta, norm def dirichlet_sample_approximation(base_measure, alpha, tol=0.01):
betas = []
pis = []
betas.append(beta(1, alpha).rvs())
pis.append(betas[0])
while sum(pis) < (1.-tol):
s = np.sum([np.log(1 - b) for b in betas])
new_beta = beta(1, alpha).rvs()
betas.append(new_beta)
pis.append(new_beta * np.exp(s))
pis = np.array(pis)
thetas = np.array([base_measure() for _ in pis])
return pis, thetas def plot_normal_dp_approximation(alpha):
plt.figure()
plt.title("Dirichlet Process Sample with N(0,1) Base Measure")
plt.suptitle("alpha: %s" % alpha)
pis, thetas = dirichlet_sample_approximation(lambda: norm().rvs(), alpha)
pis = pis * (norm.pdf(0) / pis.max())
plt.vlines(thetas, 0, pis, )
X = np.linspace(-4,4,100)
plt.plot(X, norm.pdf(X)) plot_normal_dp_approximation(.1)
plot_normal_dp_approximation(1)
plot_normal_dp_approximation(10)
plot_normal_dp_approximation(1000)
Often we want to draw samples from a distribution sampled from a Dirichlet process instead of from the Dirichlet process itself. Much of the literature on the topic unhelpful refers to this as sampling from a Dirichlet process.
Fortunately, we don't have to draw an infinite number of samples from the base distribution and stick breaking process to do this. Instead, we can draw these samples as they are needed.
Suppose, for example, we know a finite number of the θk and πk values for a sample H∼Dir(αH0). For example, we know
To sample from H, we can generate a uniform random unumber between 0 and 1. If u is less than 0.5, our sample is 0.1. If 0.5<=u<0.8, our sample is −0.5. If u>=0.8, our sample (from H will be a new sample θ3 from H0. At the same time, we should also sample and store π3. When we draw our next sample, we will again draw u∼Uniform(0,1) but will compare against π1,π2, AND π3.
The class below will take a base distribution H0 and α as arguments to its constructor. The class instance can then be called to generate samples from H∼DP(αH0).
from numpy.random import choice class DirichletProcessSample():
def __init__(self, base_measure, alpha):
self.base_measure = base_measure
self.alpha = alpha self.cache = []
self.weights = []
self.total_stick_used = 0. def __call__(self):
remaining = 1.0 - self.total_stick_used
i = DirichletProcessSample.roll_die(self.weights + [remaining])
if i is not None and i < len(self.weights) :
return self.cache[i]
else:
stick_piece = beta(1, self.alpha).rvs() * remaining
self.total_stick_used += stick_piece
self.weights.append(stick_piece)
new_value = self.base_measure()
self.cache.append(new_value)
return new_value @staticmethod
def roll_die(weights):
if weights:
return choice(range(len(weights)), p=weights)
else:
return None
This Dirichlet process class could be called stochastic memoization. This idea was first articulated in somewhat abstruse terms by Daniel Roy, et al.
Below are histograms of 10000 samples drawn from samplesdrawn from Dirichlet processes with standard normal base distribution and varying α values.
import pandas as pd base_measure = lambda: norm().rvs()
n_samples = 10000
samples = {}
for alpha in [1, 10, 100, 1000]:
dirichlet_norm = DirichletProcessSample(base_measure=base_measure, alpha=alpha)
samples["Alpha: %s" % alpha] = [dirichlet_norm() for _ in range(n_samples)] _ = pd.DataFrame(samples).hist()
Note that these histograms look very similar to the corresponding plots of sampled distributions above. However, these histograms are plotting points sampled from a distribution sampled from a Dirichlet process while the plots above were showing approximate distributions samples from the Dirichlet process. Of course, as the number of samples from each H grows large, we would expect the histogram to be a very good empirical approximation of H.
In a future post, I will look at how this DirichletProcessSample
class can be used to draw samples from a hierarchicalDirichlet process.
Notes on the Dirichlet Distribution and Dirichlet Process的更多相关文章
- The Dirichlet Distribution 狄利克雷分布 (PRML 2.2.1)
The Dirichlet Distribution 狄利克雷分布 (PRML 2.2.1) Dirichlet分布可以看做是分布之上的分布.如何理解这句话,我们可以先举个例子:假设我们有一个骰子,其 ...
- [Bayes] Multinomials and Dirichlet distribution
From: https://www.cs.cmu.edu/~scohen/psnlp-lecture6.pdf 不错的PPT,图示很好. 伯努利分布 和 多项式分布 Binomial Distribu ...
- Dirichlet Distribution
Beta分布: 二项式分布(Binomial distribution): 多项式分布: Beta分布: Beta分布是二项式分布的共轭先验(conjugate prior) Dirichlet Di ...
- Study notes for Discrete Probability Distribution
The Basics of Probability Probability measures the amount of uncertainty of an event: a fact whose o ...
- Study notes for Latent Dirichlet Allocation
1. Topic Models Topic models are based upon the idea that documents are mixtures of topics, where a ...
- 转:Simple Introduction to Dirichlet Process
来源:http://hi.baidu.com/vyfrcemnsnbgxyd/item/2f10ecc3fc35597dced4f88b Dirichlet Process(DP)是一个很重要的统计模 ...
- [综] Latent Dirichlet Allocation(LDA)主题模型算法
多项分布 http://szjc.math168.com/book/ebookdetail.aspx?cateid=1&§ionid=983 二项分布和多项分布 http:// ...
- mahout系列----Dirichlet 分布
Dirichlet分布可以看做是分布之上的分布.如何理解这句话,我们可以先举个例子:假设我们有一个骰子,其有六面,分别为{1,2,3,4,5,6}.现在我们做了10000次投掷的实验,得到的实验结果是 ...
- 伯努利分布、二项分布、Beta分布、多项分布和Dirichlet分布与他们之间的关系,以及在LDA中的应用
在看LDA的时候,遇到的数学公式分布有些多,因此在这里总结一下思路. 一.伯努利试验.伯努利过程与伯努利分布 先说一下什么是伯努利试验: 维基百科伯努利试验中: 伯努利试验(Bernoulli tri ...
随机推荐
- Flask Web Development —— Web表单(上)
Flask-WTF扩展使得处理web表单能获得更愉快的体验.该扩展是一个封装了与框架无关的WTForms包的Flask集成. Flask-WTF和它的依赖集可以通过pip来安装: (venv) $ p ...
- 02-线性结构2 Reversing Linked List
由于最近学的是线性结构,且因数组需开辟的空间太大.因此这里用的是纯链表实现的这个链表翻转. Given a constant K and a singly linked list L, you are ...
- 从0 开始 WPF MVVM 企业级框架实现与说明 ---- 第二讲 WPF中 绑定
说到WPF, 当然得从绑定说起,这也是WPF做的很成功的一个地方,这也是现在大家伙都在抛弃使用winform的其中一个主要原因,Binding这个东西从早说到完其实都说不完的,我先就做一些基本的介绍, ...
- windows下找不到strings.h
头文件用的strings.h,换成string.h就好了.但是以前的Linux系统下用strings.h,strerror都能正常编译,怎么样能正常使用strings.h linux系统下的库问题跟w ...
- STM32单片机实现中断后不继续向下执行而是返回到main函数
做公司的一个项目,实现一个功能就是 机器在进行一项功能时(这项工作时间挺长),想要取消这项工作,重新选择.想了半天没想出来,结果同事提醒了一句,可以在程序中加一个外部中断,在中断中软件复位程序.用到以 ...
- Shell 内置操作符-字符串处理(汇总)
一.判断读取字符串值 表达式 含义 ${var} 变量var的值, 与$var相同 ${var-DEFAULT} 如果var没有被声明, 那么就以$DEFAULT作为其值 * ${var:-D ...
- Microsoft SqlServer2008技术内幕:T-Sql语言基础-读书笔记-单表查询SELECT语句元素
1.select语句逻辑处理顺序: FORM WHERE GROUP BY HAVING SELECT OVER DISTINCT TOP ORDER BY 总结: 2.FORM子句的表名称应该带上数 ...
- ode.js 版本控制 nvm 和 n 使用 及 nvm 重启终端失效的解决方法
今天的话题包括2个部分 node.js 下使用 nvm 或者 n 来进行版本控制 nvm 安装node.js 版本后,重启终端 node , npm 环境变量失效 第一部分 用什么来管理 node.j ...
- bzoj 3196/tyvj p1730 二逼平衡树
原题链接:http://www.tyvj.cn/p/1730 树套树... 如下: #include<cstdio> #include<cstdlib> #include< ...
- 对 Linux 新手非常有用的20个命令
你打算从Windows换到Linux上来,还是你刚好换到Linux上来?哎哟!!!我说什么呢,是什么原因你就出现我的世界里了.从我以往的经验来说,当我刚使用Linux,命令,终端啊什么的,吓了我一跳. ...