Simulation-计算统计-随机数生成
library('ggplot2')
library('dplyr')
Lecture 6 Methods for generating random numbers
Goal: Use U(0, 1) numbers to generate observations (variates) from other distributions, and even stochastic processes.
- Discrete: Bernoulli, Binomial, Poisson, empirical
- Continuous: Exponential, Normal (many ways), empirical
- Multivariate normal
- Auto-regressive moving average time series
- Waiting times
Linear congruential generator LCG (线性同余法)
\]
sample(0:1, size=10,replace=TRUE)
sample(letters)
sample(1:4, size=100, replace=TRUE, prob=1:4/10) %>%
table()
Drawbacks: 1. There may be a loop, so not all possible values will be generated.
uniform pseudo-random generator
先验
u <- runif(10)
Inverse Transform Method
Theorem
If \(X\) is a continuous random variable with cdf \(F(x)\),i.e. \(X\sim F\),then $$F(X) \sim \operatorname{Uniform}(0,1)$$ Proof. Let \(F(X) = U\)
Because \(X\) is continuous, then \(F\) is an increasing function, and thus
\]
Since \(X \sim F\), it follows that for any \(0<u<1\).
\]
Def. Inverse transformation of F
\]
证明看老师ppt。还有一些题目可以看"Random Variate Generation"
Continuous Case
逆一般比较好求。
求密度分布为\(F_X(x)\)的X的随机样本。
生成\(U\sim \mathcal{U}(0,1)\)
推导出逆函数\(F^{− 1}(u)\)
写一个命令或函数来计算\(F^{-1}(u)\)
对每一随机变量都要求:
(1)生成的随机数\(u\)来自于Uniform(0, 1)
(2)得到的\(x = F^{-1}(u)\), 此处的x就是服从\(F(x)\)
Eg.服从密度分布为\(f(x)=3x^2 (0<x<1)\)的random sample。
set.seed(1)
u <- runif(1000)
x <- u^(1/3)
hist(x, prob = TRUE)
y <- seq(0, 1, .01)
lines(y, 3 * y^2)
f <- function(x) 3 * x^2
ggplot(as.data.frame(x), aes(x = x)) +
geom_histogram(aes(y = ..density..), color = "black", alpha = 0.7) +
geom_density(color = "red") +
stat_function(fun = f, color = "blue")
结论:生成的随机数直方图与理论密度分布曲线保持一致。符合预期。
Discrete Case
注意一下求逆的过程。
- 生成的随机样本\(u\)服从\(\mathcal{U}(0,1)\)
- 计算\(x = F^{-1}(u)\)
- 当\(F(x_{i-1})<u<F(x_{i})\),输出\(x_i\)
Eg.(Two point distribution) Generate a random sample of Bernoulli variables with p = 0.4.
u <- runif(1000)
x <- as.integer(u > 0.6)
table(x)
Eg.(几何分布) the probability of dropping the dice n times succeed 1 time.
n <- 1000
p <- 0.25
set.seed(308)
u <- runif(n)
x <- (log(1 - u)/log(1 - p)) %>% ceiling()
hist(x,probability = TRUE)
Sol2:generate Fk, then calculate u is larger than how many Fk. That figure is F-1(u).
[For distributions that have no analytical notation, Use this Method!!]
n <- 1000
p <- 0.25
x <- numeric(n) #def a numeric vecor of length n
set.seed(308)
u <- runif(1000)
K <- 100 ## set the initial length of cdf vector,in fact this could be infinity
k <- 1:K
Fk <- 1 - (1 - p)^(k - 1)
for(i in 1:n){
x[i] = sum(u[i] > Fk) # u[i]>Fk is a list of [T,T,T,...,F,F,F], we calculate the number of k's that u is greater than Fk.
}
mean(x)
\text{Note} that not all X are [1,2,3,...], THEY MAY BE [102,304,-3].
Another dict is needed.
g <- c(102,304,03)
#g(x) # (mapping) y=g(x) is what we needed Y~F
\text{Note} that HERE we use k <- 100, however, practically it may be larger then 100, so we need to use max(x)
to check whether K <- 100
is exceeded. For distributions that does not have the analytical cdf, we can use Fk <- cumsum(pk)
to calculate.
The Acceptance-Rejection Method
Motivation: The majority of cdf's cannot be inverted efficiently. A-R samples form a distribution that is "almost" the one we want, and then
adjusts by "accepting" only a certain proportion of those samples.
假定X与Y是服从密度函数 f 和 g
的随机变量,存在一个常数\(c\)使$$ \frac{f(t)}{g(t)}\le c $$对所有 t
都满足\(f(t)>0\),则接受拒绝法可以用来生成X的随机样本。
目标:生成服从\(f(x)\)的随机样本X
找到一个随机样本Y,该随机变量的密度函数为g,且满足\(\frac{f(y)}{g(y)} \le c\),同时对所有y,都满足\(f(y)>0\),生成Y的随机样本。
生成随机样本U来自于\(U(0,1)\)
如果\(u<\frac{f(y)}{cg(y)}\),则接受y并传递给x,令\(x =y\),
否则:拒绝y,重复操作2(a).
Note:
1.每成功一次的概率是1/c,所以成功n次的概率是nc,所以上界c越小越好。
- g怎么选:可以根据f的support来选。
证明看老师ppt。
Eg. Generate Beta(2, 2) random variables. 即\(f(x) = 6x(1-x)\), \((0<x<1)\)
so let \(g(x) \sim \operatorname{Uniform}(0,1)\), then $g(x) = 1 $, for\(0<x<1\), \(c=6\)
n <- 1000
k <- 0 # counter for acceptance
count <- 0 # count the number of iterations
y <- numeric(n)
while(k < n){
u <- runif(1)
count <- count + 1
x <- runif(1) # random variable from g
if (x * (1 - x) > u){
# we accept x
k <- k + 1
y[k] <- x
}
}
j
Transformation Methods
(0) If \(Z \sim N(0,1)\), then \(Z^2 \sim \chi^2(1)\).
- If \(U \sim \chi^2(m)\) and \(V \sim \chi^2(n)\) are independent, then
\]
- If \(Z \sim N(0,1)\) and \(V \sim \chi^2(n)\) are independent, then
\]
- If \(U, V \sim\) Uniform \((0,1)\) are independent, then
\]
are independent \(N(0,1)\) random variables.
- If \(U \sim \Gamma(r, \lambda)\) and \(V \sim \Gamma(s, \lambda)\) are independent, then
\]
Sums and Mixtures
Convolutions
Let \(X_1, \ldots, X_n\) be i.i.d. with distribution \(X_i \sim X\) (the distribution function is \(F_X\) ), and let
\]
The distribution function of the sum \(S\) is called the \(n\)-fold convolution of \(X\) and denoted by \(F_X^{*(n)}\).
Example
- \(Z_1^2+Z_2^2 \cdots+Z_m^2 \sim \chi_m^2\)
- \(X_1+\cdots+X_m \sim\) Negative Binomial \((m, p)\), where \(X \sim \operatorname{Geometric~}(p)\).
- \(X_1+\cdots+X_m \sim \Gamma(m, \lambda)\), where \(X \sim \operatorname{Exp}(\lambda)\).
Sums (Composition Method)
Eg: Chi-squared distribution \(\chi^2(m)\)
n <- 1000
m <- 3
X <- matrix(rnorm(n * m), ncol = m)^2
y <- rowSums(X)
Mixtures
Using theta
\(X ~ \frac{1}{3} N (\mu_1,\sigma_1^2) + \frac{2}{3} N (\mu_2,\sigma_2^2)\)的意思是X有1/3的概率取第一个,有2/3的概率取第二个。不能直接用两个分布的相加。!!!!
对比下图x,y有着明显的差异 WHY????
n <- 1000
x1 <- rnorm(n, 0, 1)
x2 <- rnorm(n, 3, 1)
u <- runif(n)
k <- as.integer(u < 1/3)
x <- k * x1 + (1 - k) * x2
y <- 1/3*x1 + 2/3*x2
hist(x)
hist(y)
Continuous mixture
A random variable \(X\) is a continuous mixture if the distribution of \(X\) is
\]
for a family \(X \mid Y=y\) indexed by the real numbers \(y\) and weighting function \(f_Y\) such that \(\int f_Y(y) d y=1\)
Example: The Poisson-Gamma mixture
The negative binomial distribution is a mixture of Poisson \((\Lambda)\) distribution, where \(\Lambda\) has a gamma distribution \(\Gamma(r, \beta)\). That is, if
\]
then
\]
Ex:负二项分布==Poisson-Gamma mixture
n <- 1000
r <- 4
beta <- 3
lambda <- rgamma(n, r, beta)
x <- rpois(n, lambda) # 每一个x相当于(对每一个lambda生成一个x)
print(lambda)
print(x)
Multivariate Distribution
如何生成多元正态分布 \(AZ+b ~ N_p(b, AA^T)\)
Theorem
Let \(Z=\left(Z_1, \ldots, Z_p\right)^T\) be a vector of i.i.d. \(N(0,1)\) random variables. Then,
\]
Assume that \(Z \sim N_p(\mu, \Sigma)\). For any \(A \in \mathbb{R}^{q \times p}\) and \(b \in \mathbb{R}^q\), then
\]
SO, in order to generate \(X\), let's start with a vector \(Z=\left(Z_1, \ldots, Z_k\right)\) of iid \(\operatorname{Nor}(0,1)\) RV's. That is, suppose \(Z \sim \operatorname{Nor}_k(0, I)\), where \(I\) is the \(k \times k\) identity matrix, and 0 is simply a vector of 0 's.
Suppose we can find a (lower triangular) matrix \(C\) such that \(\Sigma=C C^{\mathrm{T}}\).
Then it can be shown that \(X=\mu+C Z\) is multivariate normal with mean \(\mathrm{E}[X]\) and covariance matrix
That is, \(X \sim \operatorname{Nor}_k(\boldsymbol{\mu}, \Sigma)\).
want to find: \(A = \Sigma^{\frac{1}{2}}\)
Choleski factorization method
The Choleski factorization of a real symmetric positive-definite matrix is \(\Sigma = Q^T Q\), where Q is an upper triangular matrix.
Sigma <- matrix(c(4, 12, 12, 37), nrow = 2)
chol(Sigma)
Spectral decomposition method
\(Q = U\Lambda^{\frac{1}{2}}U^T = \Sigma\)
library(knitr)
library(dplyr)
eigen_Sigma <- eigen(Sigma)
lam <- eigen_Sigma$values
U <- eigen_Sigma$vectors
Lam <- diag(lam)
Lam
U %*% Lam %*% t(U) %>% round()
U %*% sqrt(Lam) %*% t(U) %>% round(digits = 3)
Singular value decomposition (SVD) method
参考:https://blog.csdn.net/Yeeyi_max?type=blog 的一些文章
https://www2.isye.gatech.edu/~sman/courses/Mexico2010/Module07-RandomVariateGeneration.pdf
Simulation-计算统计-随机数生成的更多相关文章
- 基于ACCESS和ASP的SQL多个表查询与计算统计代码(一)
近期在写几个关于"Project - Subitem - Task"的管理系统,说是系统还是有点夸大了,基本就是一个多表查询调用和insert.update的数据库操作.仅仅是出现 ...
- casio计算器计算统计数据
http://blog.csdn.net/pipisorry/article/details/50257319 使用casio计算器计算输入数据均值.标准差和相关系数的方法,lz使用casio fx8 ...
- excel常用公式--计算统计类
Count/Countif/Countifs:条件计数. 注:count只能对数值进行统计 sum/sumif/sumifs:条件求和. Average/Averageifs: 返回参数的平均值 ...
- 通过statCounter计算给定的RDD[Double]的统计信息的方法
需求1:给定一个RDD[Double],进行计算,该RDD的统计信息(count,mean,stdev,max,min) 代码: def main(args: Array[String]): Unit ...
- 在网页中运用统计Web Service接口
(2017-02-10 银河统计) 在"统计随机数及临界值Web Service接口"一文中介绍了常用统计分布四类Web Service接口(随机数.分位数.密度函数和累积分布函数 ...
- Atitit sql计划任务与查询优化器--统计信息模块
Atitit sql计划任务与查询优化器--统计信息模块 每一个统计信息的内容都包含以上三部分的内容. 我们依次来分析下,通过这三部分内容SQL Server如何了解该列数据的内容分布的. a.统计信 ...
- R语言 ETL+统计+可视化
这篇文章...还是看文章吧 导入QQ群信息,进行ETL,将其规范化 计算哪些QQ发言较多 计算一天中哪些时段发言较多 计算统计内所有天的日发言量 setwd("C:/Users/liyi/D ...
- SQL Server调优系列进阶篇(深入剖析统计信息)
前言 经过前几篇的分析,其实大体已经初窥到SQL Server统计信息的重要性了,所以本篇就要祭出这个神器了. 该篇内容会很长,坐好板凳,瓜子零食之类... 不废话,进正题 技术准备 数据库版本为SQ ...
- 使用DBMS_STATS来收集统计信息【转】
overview Oracle's cost-based optimizer (COB) uses statistics to calculate the selectivity (the fract ...
- SQL Server调优系列进阶篇 - 深入剖析统计信息
前言 经过前几篇的分析,其实大体已经初窥到SQL Server统计信息的重要性了,所以本篇就要祭出这个神器了. 该篇内容会很长,坐好板凳,瓜子零食之类... 不废话,进正题 技术准备 数据库版本为SQ ...
随机推荐
- PHP中获取时间的下一周下个月的方法
PHP中获取时间的下一周,下个月等通常用于定制服务的时候使用,比如包月会员,包年等等 //通常用于定制服务的时候使用,比如包月会员,包年等等 //获取当前时间过一个月的时间,以DATETIME格式显示 ...
- Servlet的学习之路
一.什么是什么Servlet? Java Servlet 是运行在 Web 服务器或应用服务器上的程序,它是作为来自 Web 浏览器或其他 HTTP 客户端的请求和 HTTP 服务器上的数据库或应用程 ...
- @click使用三元运算符
@click="scope.row.status == 1 ? '' : blockUp(scope.row) "
- sourcetree 合并某部分代码到另一个分支
1. 在sourceTree中找到需要修改的分支 2. 在显示提交信息中,选择所有分支,这样就会出现soy分支的修改信息 3. 找到需要合并的某次commit, 点击,右键出现弹框 4. 选择 & ...
- .net创建、发布、引用webservice项目
创建webservice引用 增加代码: 运行如下: 之后就可以发布我们的项目了,右击项目,选择发布: 此地址不要选择项目地址,另外创建一个地址: 至此,发布成功,接下来iis增加web网站: 这里i ...
- 利用easyExcel生成excel并上传文件服务器(单独设置表头)
结合相关easyExcel的相关信息//上传服务器方法,返回url链接地址public String exportToMinIO(List<aaaDto> list) { String p ...
- 使用vCenter对ESXi主机进行补丁升级
使用vCenter 对ESXi 主机进行补丁升级 背景说明:公司内部有许多ESXi主机需要进行补丁升级,记录一下通过vCenter对ESXi主机进行补丁升级的过程,也可以使用esxcli命令行方式: ...
- 文件上传靶场 upload-labs Pass 5-10
Pass-5 .user.ini文件 根据我的观察,最新版的upload-labs第五关和旧版的不一样,这一关可以使用和Pass-10一样的方法通过,但是,其他所有的关卡都禁止了.ini文件的上传,就 ...
- 【QCustomPlot】简介
说明 使用 QCustomPlot 绘图库辅助开发时整理的学习笔记. 目录 说明 1. 库简介 2. 库的官网链接 3. 库的帮助文档 4. 库的下载地址 5. 库的版本号说明 6. 库的 Git 地 ...
- C/C++编译构建相关问题
名词辨析 GNU GNU's Not Unix!的递归缩写 一个自由的操作系统,起源于GNU计划,希望发展出一套完整的开放源代码操作系统来取代Unix 基本组成包括: GNU编译器套装(GCC) GN ...