library('ggplot2')

library('dplyr')

Lecture 6 Methods for generating random numbers

Goal: Use U(0, 1) numbers to generate observations (variates) from other distributions, and even stochastic processes.

Discrete: Bernoulli, Binomial, Poisson, empirical
Continuous: Exponential, Normal (many ways), empirical
Multivariate normal
Auto-regressive moving average time series
Waiting times

Linear congruential generator LCG (线性同余法)

\[X_{n+1} = (aX_n+c) \text{ mod } m, n \geq 0
\]

sample(0:1, size=10,replace=TRUE)

sample(letters)

sample(1:4, size=100, replace=TRUE, prob=1:4/10) %>%

  table()

Drawbacks: 1. There may be a loop, so not all possible values will be generated.

uniform pseudo-random generator

先验

u <- runif(10)

Inverse Transform Method

Theorem

If $X$ is a continuous random variable with cdf $F(x)$,i.e. $X\sim F$,then $$F(X) \sim \operatorname{Uniform}(0,1)$$ Proof. Let $F(X) = U$

Because $X$ is continuous, then $F$ is an increasing function, and thus

\[F(X) \leq u \Longleftrightarrow X \leq F^{-1}(u)
\]

Since $X \sim F$, it follows that for any $0<u<1$.

\[P(U \leq u)=P(F(X) \leq u)=P\left(X \leq F^{-1}(u)\right)=F\left(F^{-1}(u)\right)=u .
\]

Def. Inverse transformation of F

\[F^{-1}(u)=\operatorname{inf}\{x:F(x) \geq u\},\quad 0 < u < 1
\]

证明看老师ppt。还有一些题目可以看"Random Variate Generation"

Continuous Case

逆一般比较好求。

求密度分布为$F_X(x)$的X的随机样本。

生成$U\sim \mathcal{U}(0,1)$
推导出逆函数$F^{− 1}(u)$
写一个命令或函数来计算$F^{-1}(u)$
对每一随机变量都要求：

（1）生成的随机数$u$来自于Uniform(0, 1)

（2）得到的$x = F^{-1}(u)$, 此处的x就是服从$F(x)$

Eg.服从密度分布为$f(x)=3x^2 (0<x<1)$的random sample。

set.seed(1)

u <- runif(1000)

x <- u^(1/3)

hist(x, prob = TRUE)

y <- seq(0, 1, .01)

lines(y, 3 * y^2)

f <- function(x) 3 * x^2

ggplot(as.data.frame(x), aes(x = x)) +

geom_histogram(aes(y = ..density..), color = "black", alpha = 0.7) +

geom_density(color = "red") +

stat_function(fun = f, color = "blue")

结论：生成的随机数直方图与理论密度分布曲线保持一致。符合预期。

Discrete Case

注意一下求逆的过程。

生成的随机样本$u$服从$\mathcal{U}(0,1)$
计算$x = F^{-1}(u)$
当$F(x_{i-1})<u<F(x_{i})$，输出$x_i$

Eg.(Two point distribution) Generate a random sample of Bernoulli variables with p = 0.4.

u <- runif(1000)

x <- as.integer(u > 0.6)

table(x)

Eg.(几何分布) the probability of dropping the dice n times succeed 1 time.

n <- 1000

p <- 0.25

set.seed(308)

u <- runif(n)

x <- (log(1 - u)/log(1 - p)) %>% ceiling()

hist(x,probability = TRUE)

Sol2:generate Fk, then calculate u is larger than how many Fk. That figure is F-1(u).

[For distributions that have no analytical notation, Use this Method!!]

n <- 1000

p <- 0.25

x <- numeric(n) #def a numeric vecor of length n

set.seed(308)

u <- runif(1000)

K <- 100 ## set the initial length of cdf vector,in fact this could be infinity

k <- 1:K

Fk <- 1 - (1 - p)^(k - 1)

for(i in 1:n){

  x[i] = sum(u[i] > Fk) # u[i]>Fk is a list of [T,T,T,...,F,F,F], we calculate the number of k's that u is greater than Fk.

}

mean(x)

\text{Note} that not all X are [1,2,3,...], THEY MAY BE [102,304,-3].

Another dict is needed.

g <- c(102,304,03)

#g(x) # (mapping) y=g(x) is what we needed Y~F

\text{Note} that HERE we use k <- 100, however, practically it may be larger then 100, so we need to use max(x) to check whether K <- 100

is exceeded. For distributions that does not have the analytical cdf, we can use Fk <- cumsum(pk) to calculate.

The Acceptance-Rejection Method

Motivation: The majority of cdf's cannot be inverted efficiently. A-R samples form a distribution that is "almost" the one we want, and then

adjusts by "accepting" only a certain proportion of those samples.

假定X与Y是服从密度函数 f 和 g

的随机变量，存在一个常数$c$使$$ \frac{f(t)}{g(t)}\le c $$对所有 t

都满足$f(t)>0$，则接受拒绝法可以用来生成X的随机样本。

目标：生成服从$f(x)$的随机样本X

找到一个随机样本Y，该随机变量的密度函数为g，且满足$\frac{f(y)}{g(y)} \le c$,同时对所有y，都满足$f(y)>0$，生成Y的随机样本。
生成随机样本U来自于$U(0,1)$
如果$u<\frac{f(y)}{cg(y)}$，则接受y并传递给x，令$x =y$,

否则:拒绝y,重复操作2(a).

Note:

1.每成功一次的概率是1/c,所以成功n次的概率是nc，所以上界c越小越好。

g怎么选：可以根据f的support来选。

证明看老师ppt。

Eg. Generate Beta(2, 2) random variables. 即$f(x) = 6x(1-x)$, $(0<x<1)$

so let $g(x) \sim \operatorname{Uniform}(0,1)$, then $g(x) = 1 $, for$0<x<1$, $c=6$

n <- 1000

k <- 0 # counter for acceptance

count <- 0 # count the number of iterations

y <- numeric(n)

while(k < n){

  u <- runif(1)

  count <- count + 1

  x <- runif(1) # random variable from g

  if (x * (1 - x) > u){

    # we accept x

    k <- k + 1

    y[k] <- x

  }

}

j

Transformation Methods

(0) If $Z \sim N(0,1)$, then $Z^2 \sim \chi^2(1)$.

If $U \sim \chi^2(m)$ and $V \sim \chi^2(n)$ are independent, then

\[F=\frac{U / m}{V / n} \sim F(m, n) .
\]

If $Z \sim N(0,1)$ and $V \sim \chi^2(n)$ are independent, then

\[T=\frac{Z}{\sqrt{V / n}} \sim t(n).
\]

If $U, V \sim$ Uniform $(0,1)$ are independent, then

\[Z_1=\sqrt{-2 \log U} \cos (2 \pi V), \quad Z_2=\sqrt{-2 \log U} \sin (2 \pi V)
\]

are independent $N(0,1)$ random variables.

If $U \sim \Gamma(r, \lambda)$ and $V \sim \Gamma(s, \lambda)$ are independent, then

\[X=\frac{U}{U+V} \sim \operatorname{Beta}(r, s) .
\]

Sums and Mixtures

Convolutions

Let $X_1, \ldots, X_n$ be i.i.d. with distribution $X_i \sim X$ (the distribution function is $F_X$ ), and let

\[S=X_1+\cdots+X_n .
\]

The distribution function of the sum $S$ is called the $n$-fold convolution of $X$ and denoted by $F_X^{*(n)}$.

Example

$Z_1^2+Z_2^2 \cdots+Z_m^2 \sim \chi_m^2$
$X_1+\cdots+X_m \sim$ Negative Binomial $(m, p)$, where $X \sim \operatorname{Geometric~}(p)$.
$X_1+\cdots+X_m \sim \Gamma(m, \lambda)$, where $X \sim \operatorname{Exp}(\lambda)$.

Sums (Composition Method)

Eg: Chi-squared distribution $\chi^2(m)$

n <- 1000

m <- 3

X <- matrix(rnorm(n * m), ncol = m)^2

y <- rowSums(X)

Mixtures

Using theta

$X ~ \frac{1}{3} N (\mu_1,\sigma_1^2) + \frac{2}{3} N (\mu_2,\sigma_2^2)$的意思是X有1/3的概率取第一个，有2/3的概率取第二个。不能直接用两个分布的相加。!!!!

对比下图x,y有着明显的差异 WHY？？？？

n <- 1000

x1 <- rnorm(n, 0, 1)

x2 <- rnorm(n, 3, 1)

u <- runif(n)

k <- as.integer(u < 1/3)

x <- k * x1 + (1 - k) * x2

y <- 1/3*x1 + 2/3*x2

hist(x)

hist(y)

Continuous mixture

A random variable $X$ is a continuous mixture if the distribution of $X$ is

\[F_X(x)=\int_{-\infty}^{\infty} F_{X \mid Y=y}(x) f_Y(y) d y
\]

for a family $X \mid Y=y$ indexed by the real numbers $y$ and weighting function $f_Y$ such that $\int f_Y(y) d y=1$

Example: The Poisson-Gamma mixture

The negative binomial distribution is a mixture of Poisson $(\Lambda)$ distribution, where $\Lambda$ has a gamma distribution $\Gamma(r, \beta)$. That is, if

\[(X \mid \Lambda=\lambda) \sim \operatorname{Poission}(\lambda) \text { and } \Lambda \sim \Gamma(r, \beta),
\]

then

\[X \sim \operatorname{Negative} \operatorname{Binomial}\left(r, \frac{\beta}{1+\beta}\right) .
\]

Ex:负二项分布==Poisson-Gamma mixture

n <- 1000

r <- 4

beta <- 3

lambda <- rgamma(n, r, beta)

x <- rpois(n, lambda) # 每一个x相当于（对每一个lambda生成一个x）

print(lambda)

print(x)

Multivariate Distribution

如何生成多元正态分布 $AZ+b ~ N_p(b, AA^T)$

Theorem

Let $Z=\left(Z_1, \ldots, Z_p\right)^T$ be a vector of i.i.d. $N(0,1)$ random variables. Then,

\[Z \sim N_p\left(0, I_p\right) .
\]

Assume that $Z \sim N_p(\mu, \Sigma)$. For any $A \in \mathbb{R}^{q \times p}$ and $b \in \mathbb{R}^q$, then

\[A Z+b \sim N_q\left(A \mu+b, A \Sigma A^T\right) .
\]

SO, in order to generate $X$, let's start with a vector $Z=\left(Z_1, \ldots, Z_k\right)$ of iid $\operatorname{Nor}(0,1)$ RV's. That is, suppose $Z \sim \operatorname{Nor}_k(0, I)$, where $I$ is the $k \times k$ identity matrix, and 0 is simply a vector of 0 's.

Suppose we can find a (lower triangular) matrix $C$ such that $\Sigma=C C^{\mathrm{T}}$.

Then it can be shown that $X=\mu+C Z$ is multivariate normal with mean $\mathrm{E}[X]$ and covariance matrix

\[\Sigma \equiv \mathrm{E}\left[(C Z)(C Z)^T\right]=\mathrm{E}\left[C Z Z^T C^T\right]=C\left(\mathrm{E}\left[\boldsymbol{Z} \boldsymbol{Z}^T\right]\right) C^T=C C^T . \]

That is, $X \sim \operatorname{Nor}_k(\boldsymbol{\mu}, \Sigma)$.

want to find: $A = \Sigma^{\frac{1}{2}}$

Choleski factorization method

The Choleski factorization of a real symmetric positive-definite matrix is $\Sigma = Q^T Q$, where Q is an upper triangular matrix.

Sigma <- matrix(c(4, 12, 12, 37), nrow = 2)

chol(Sigma)

Spectral decomposition method

$Q = U\Lambda^{\frac{1}{2}}U^T = \Sigma$

library(knitr)

library(dplyr)

eigen_Sigma <- eigen(Sigma)

lam <- eigen_Sigma$values

U <- eigen_Sigma$vectors

Lam <- diag(lam)

Lam

U %*% Lam %*% t(U) %>% round()

U %*% sqrt(Lam) %*% t(U) %>% round(digits = 3)

Singular value decomposition (SVD) method

参考：https://blog.csdn.net/Yeeyi_max?type=blog 的一些文章

https://www2.isye.gatech.edu/~sman/courses/Mexico2010/Module07-RandomVariateGeneration.pdf

Simulation-计算统计-随机数生成的更多相关文章

基于ACCESS和ASP的SQL多个表查询与计算统计代码（一）
近期在写几个关于"Project - Subitem - Task"的管理系统,说是系统还是有点夸大了,基本就是一个多表查询调用和insert.update的数据库操作.仅仅是出现 ...
casio计算器计算统计数据
http://blog.csdn.net/pipisorry/article/details/50257319 使用casio计算器计算输入数据均值.标准差和相关系数的方法,lz使用casio fx8 ...
excel常用公式--计算统计类
Count/Countif/Countifs:条件计数. 注:count只能对数值进行统计 sum/sumif/sumifs:条件求和. Average/Averageifs: 返回参数的平均值 ...
通过statCounter计算给定的RDD[Double]的统计信息的方法
需求1:给定一个RDD[Double],进行计算,该RDD的统计信息(count,mean,stdev,max,min) 代码: def main(args: Array[String]): Unit ...
在网页中运用统计Web Service接口
(2017-02-10 银河统计) 在"统计随机数及临界值Web Service接口"一文中介绍了常用统计分布四类Web Service接口(随机数.分位数.密度函数和累积分布函数 ...
Atitit sql计划任务与查询优化器--统计信息模块
Atitit sql计划任务与查询优化器--统计信息模块每一个统计信息的内容都包含以上三部分的内容. 我们依次来分析下,通过这三部分内容SQL Server如何了解该列数据的内容分布的. a.统计信 ...
R语言 ETL+统计+可视化
这篇文章...还是看文章吧导入QQ群信息,进行ETL,将其规范化计算哪些QQ发言较多计算一天中哪些时段发言较多计算统计内所有天的日发言量 setwd("C:/Users/liyi/D ...
SQL Server调优系列进阶篇（深入剖析统计信息）
前言经过前几篇的分析,其实大体已经初窥到SQL Server统计信息的重要性了,所以本篇就要祭出这个神器了. 该篇内容会很长,坐好板凳,瓜子零食之类... 不废话,进正题技术准备数据库版本为SQ ...
使用DBMS_STATS来收集统计信息【转】
overview Oracle's cost-based optimizer (COB) uses statistics to calculate the selectivity (the fract ...
SQL Server调优系列进阶篇 - 深入剖析统计信息
前言经过前几篇的分析,其实大体已经初窥到SQL Server统计信息的重要性了,所以本篇就要祭出这个神器了. 该篇内容会很长,坐好板凳,瓜子零食之类... 不废话,进正题技术准备数据库版本为SQ ...

随机推荐

2.20 Q_Learning 和Sarsa 的区别
二者都是基于Qtable的算法,其中Qlearning属于off-policy,Sarsa属于on-policy. 算法伪代码: 二者主要区别是更新Qtable的方式不同:
golang sync.Once用法
sync.Once能确保实例化对象Do方法在多线程环境只运行一次,内部通过互斥锁实现 Do方法 Once类型的Do方法只接收一个参数,参数的类型必须是func(),即无参数无返回的函数.该方法的功能并 ...
requests使用socks代理
requests在2.10.0版本开始支持socks代理自己搭了个服务器所以就想顺便用一下. import requests url = 'xxx' my_proxies={"http&q ...
[Python]判断字符串是否是数值，可以采用正则表达式技术来进行输入检测处理
判断字符串是否是数值的传统代码参考: sInput=input("请输入数值[整数或小数]:") sInput2="" bErr=False cSign=&qu ...
R 曲线拐点
x = seq(1,15) y = c(4,5,6,5,5,6,7,8,7,7,6,6,7,8,9) plot(x,y,type="l",ylim=c(3,10)) lo < ...
dosbox debugger
通过中断看程序运行过程最终目的是要找到数据保存位置,如何保存到文件的.
功能测试--APP专项
APP测试重点 APP测试与web测试的区别 APP测试常见问题 APP日志分析 APP压力稳定性测试
CentOS 7 时区设置 EST和CST设置
1. https://blog.csdn.net/allway2/article/details/102995747 CentOS 7 时区设置# timedatectl status Lo ...
adb命令1
adb是什么 adb的全称为Android Debug Bridge,就是起到调试桥的作用.它就是一个命令行窗口,用于通过电脑端与模拟器或者是设备之间的交互. adb有什么用借助adb工具,我们可以 ...
vsftpd2.3.4 后门笑脸漏洞
漏洞概要在vsftpd 2.3.4版本中,在登录输入用户名时输入:)类似于笑脸的符号,会导致服务处理开启6200后门端口,直接执行系统命令漏洞利用攻击机:kali 2022 msfconsole ...

Simulation-计算统计-随机数生成