How to generate a sample from $p(x)$?

Let's first see how Matlab samples from a $p(x)$. In Matlab, there are several common probability distributions.

Try univariate Gaussian distribution

p= normpdf(linspace(xmin , xmax , num_of_points) , mean, standard_deviation);%PDF
c= normcdf(linspace(xmin , xmax , num_of_points) , mean, standard_deviation);%CDF
y=normrnd(mean, standard_deviation,number_of_samples, 1);%Random Number Generating Method

Try PDF:

x=linespace(-1,1,1000);
p=normpdf(x ,0, 1);
plot(x,p);

Note: linespace returns a vector which is usually accessed like this

x(1)%the first elem, not x(0)
x(:)
x(1,:)

Try RSM:

y=normrnd(0, 1,100, 1);%试试采10000个样本
hist( y , 20 );%20 bars

Try univariate uniform distribution

p= unifpdf(linspace(xmin , xmax , num_of_points) , a,b);%PDF
c= unifcdf(linspace(xmin , xmax , num_of_points) , a,b);%CDF
y=unifrnd(a,b,number_of_samples, 1);%RNG

Try PDF:

x=linespace(-10,10,1000);
p= unifpdf(x ,-5,5);
plot(x,p);

Try RSM:

y=unifrnd(-5, 5,100, 1);%试试采10000个样本
hist( y , 20 );%20 bars

Matlab provides random number generating functions for some standard $p(x)$, it doesn't provide us sampling functions for a general $p(x)$. Here I show some common sampling methods.

Inverse Transform Sampling(ITS)

with descret variables

This method generates random numbers from any probability distribution given the inverse of its cumulative distribution function. The idea is to sample uniformly distributed random numbers (between 0 and 1) and then transform these values using the inverse cumulative distribution function(InvCDF)(which can be descret or continous). If the InvCDF is descrete, then the ITS method just requires a table lookup, like shown in Table 1.

Table 1. Probability of digits observed in human random digit generation experiment

There is a method called randsample in Matlab that can implement the sampling process using the Table 1. See the code below.

%Note: The randsample doesn't defaultly exist in Octave-core package, install statistic package from http://octave.sourceforge.net/statistics/ before using randsample.

% probabilities for each digit
theta=[0.000; ... % digit 0
0.100; ... % digit 1
0.090; ... % digit 2
0.095; ... % digit 3
0.200; ... % digit 4
0.175; ... % digit 5
0.190; ... % digit 6
0.050; ... % digit 7
0.100; ... % digit 8
0.000]; seed = 1; rand( 'state' , seed );% fix the random number generator
K = 10000;% let's say we draw K random values
digitset = 0:9;
Y = randsample(digitset,K,true,theta);
figure( 1 ); clf;
counts = hist( Y , digitset );
bar( digitset , counts , 'k' );
xlim([-0.5 9.5]);
xlabel( 'Digit' );
ylabel( 'Frequency' );
title( 'Distribution of simulated draws of human digit generator' );
pause;

Instead of using the built-in functions such as randsample or mnrnd, it is helpful to consider how to implement the underlying sampling algorithm using the inverse transform method which is:

(1) Calculate $F(X)$.

(2) Sample u from Uniform(0,1).

(3) Get a sample $x^{i}$ of $P(X)$, which is $F(u)^{-1}$.

(4) Repeat (2) and (3) until we get enough samples.

Note: For discrete distributions, $F(X)^{-1}$ is discrete, the way to get a sample $x^{i}$ is illustrated below where $u=0.8,~x^{i}=6$ .

with continuous variables

This can be done with the following procedure:

(1) Draw U ∼ Uniform(0, 1).

(2) Set $X=F(U)^{-1}$

(3) Repeat

For example, we want to sample random numbers from the exponential distribution where  its CDF is F (x|λ) = 1 − exp(−x/λ) . Then $F(u|\gamma)^{-1}=-log(1-u)\gamma$. Therefore replace $F(U)^{-1}$ with $F(u|\gamma)^{-1}$.

p=-log(1-unifrnd(0,1,10000,1))*2;
hist(p,30);

Reject Sampling

Applied situation: impossible/difficult to compute CDF of $P(X)$.

Advantage: unlike MCMC, it doesn't require of any “burn-in” period, i.e., all samples obtained during sampling can immediately be used as samples from the target distribution $p(\theta)$.

Based on the Figure above, the method is:

(1) Choose a proposal distribution q(θ) that is easy to sample from.

(2) Find a constant c such that cq(θ) ≥ p(θ) for all θ.

(3) Draw a proposal θ from q(θ).

(4) Draw a u from Uniform[0, cq(θ)].

(5) Reject the proposal if u > p(θ), accept otherwise. Actually, since u is sampled from Uniform[0, cq(θ)], it is equal to state like this " Reject if $u\in[p(\theta),cq(\theta)]$, accept otherwise".

(6) Repeat steps 3, 4, and 5 until desired number of samples is reached; each accepted sample $\theta$ is a draw from p(θ).

For example

then the code is

k=100000;%draw k samples
c=2;
theta_vec=unifrnd(0,1,k,1)%gen a proposal vector from q($\theta$)
cq_vec=c*unifpdf(theta_vec);%cq(theta) vector
p_vec=2*theta_vec;%p(theta) vector
u_vec=[];
for cq=cq_vec
u_vec=[u_vec;unifrnd(0,cq)];
end
r=theta_vec.*(u_vec<p_vec);
r(r==0)=[];%remove the “0” elements
hist(r,20);

MCMC Sampling

Before getting to know MCMC sampling, we first get to know Monte Carlo Integration and Markov Chain.

  For example:

%Implement the Markov Chain involving x under Beta(200(0.9x^((t-1))+0.05),200(1-0.9x^((t-1)-0.05))

fa=inline('','x')%parameter a for beta
fb=inline('200*(1-0.9*x-0.05)','x');%parameter b for beta
no4mc=4;%4 markove chains
states=unifrnd(0,1,1,no4mc);%initial states
N=1000;%200 samples drawn from 4 chains
X=states;
for i=1:N
states=betarnd(fa(states),fb(states));
X=[X;states];
end;
plot(X);
pause;

Metroplis Sampling

MCMC and Bayesian Data Analysis(PPT在文件模块)的更多相关文章

  1. 《利用Python进行数据分析: Python for Data Analysis 》学习随笔

    NoteBook of <Data Analysis with Python> 3.IPython基础 Tab自动补齐 变量名 变量方法 路径 解释 ?解释, ??显示函数源码 ?搜索命名 ...

  2. 深入浅出数据分析 Head First Data Analysis Code 数据与代码

    <深入浅出数据分析>英文名为Head First Data Analysis Code, 这本书中提供了学习使用的数据和程序,原书链接由于某些原因不 能打开,这里在提供一个下载的链接.去下 ...

  3. 数据分析---《Python for Data Analysis》学习笔记【04】

    <Python for Data Analysis>一书由Wes Mckinney所著,中文译名是<利用Python进行数据分析>.这里记录一下学习过程,其中有些方法和书中不同 ...

  4. 数据分析---《Python for Data Analysis》学习笔记【03】

    <Python for Data Analysis>一书由Wes Mckinney所著,中文译名是<利用Python进行数据分析>.这里记录一下学习过程,其中有些方法和书中不同 ...

  5. 数据分析---《Python for Data Analysis》学习笔记【02】

    <Python for Data Analysis>一书由Wes Mckinney所著,中文译名是<利用Python进行数据分析>.这里记录一下学习过程,其中有些方法和书中不同 ...

  6. 数据分析---《Python for Data Analysis》学习笔记【01】

    <Python for Data Analysis>一书由Wes Mckinney所著,中文译名是<利用Python进行数据分析>.这里记录一下学习过程,其中有些方法和书中不同 ...

  7. Aspose是一个很强大的控件,可以用来操作word,excel,ppt等文件

    Aspose是一个很强大的控件,可以用来操作word,excel,ppt等文件,用这个控件来导入.导出数据非常方便.其中Aspose.Cells就是用来操作Excel的,功能有很多.我所用的是最基本的 ...

  8. 《python for data analysis》第五章,pandas的基本使用

    <利用python进行数据分析>一书的第五章源码与读书笔记 直接上代码 # -*- coding:utf-8 -*-# <python for data analysis>第五 ...

  9. 《python for data analysis》第四章,numpy的基本使用

    <利用python进行数据分析>第四章的程序,介绍了numpy的基本使用方法.(第三章为Ipython的基本使用) 科学计算.常用函数.数组处理.线性代数运算.随机模块…… # -*- c ...

随机推荐

  1. Ansible playbook API 开发 调用测试

    Ansible是Agentless的轻量级批量配置管理工具,由于出现的比较晚(13年)基于Ansible进行开发的相关文档较少,因此,这里通过一些小的实验,结合现有资料以及源码,探索一下Ansible ...

  2. Spark的数据存储

    Spark本身是基于内存计算的架构,数据的存储也主要分为内存和磁盘两个路径.Spark本身则根据存储位置.是否可序列化和副本数目这几个要素将数据存储分为多种存储级别.此外还可选择使用Tachyon来管 ...

  3. div水平居中且垂直居中

    <style> .vertical-center{ position: absolute; top: 50%; left: 50%; transform: translate(-50%, ...

  4. 修改eclipse运行内存的大小

    一.    尝试修改Eclipse.ini 文件 (此方法不行) 找到eclipse 目录下的eclipse.ini 文件,修改下面的内容: -Xms40m -Xmx512m 修改后重启eclipse ...

  5. {MBR}{Grub}win7+Linux恢复MBR

    准备:win7安装盘,Linux安装盘 Step1:在linux下查看一下硬盘的信息fdisk -l,找到hd0和ext分区的信息 Step2: 重启插入win7安装盘,对windows系统恢复Gru ...

  6. vim 使用2 转载 为了打开方便

    http://coolshell.cn/articles/5426.html vim的学习曲线相当的大(参看各种文本编辑器的学习曲线),所以,如果你一开始看到的是一大堆VIM的命令分类,你一定会对这个 ...

  7. SQL Server(三):Select语句

      1.最基本的Select语句: Select [Top n [With Ties]] <*|Column_Name [As <Alias>][, ...n]> From & ...

  8. c++截取英文和汉字(单双字节)混合字符串

    在C++里截取字符串可以使用CString.Mid(),可是这个函数只能按英文(单字节)来截取, 如果是汉字可能就要计算好字符个数,如果是汉字和英文混合,那就没辙了. 可是恰好我需要这样一个函数,于是 ...

  9. 西门子Prodave5.5使用说明及VC示例

    西门子PLC的通信协议主要是PPI.MPI.Profibus.CP243/CP343/CP443 网络协议,prodave是早期完成的程序接口,除了网络协议外其它的主要协议都支持,SoftNet是西门 ...

  10. 写出易调试的SQL—西科软件

    1.前言 上篇 写出易调试的SQL , 带来了一些讨论, 暴露了不能重用执行计划和sql注入问题, 十分感谢园友们的建议 . 经过调整后 ,将原来的SQLHelper 抓SQL 用做调试环境用, 发布 ...