原文:Introduction todnorm,pnorm,qnorm, andrnormfor new biostatisticians

Today I was in Dan’s office hours and someone asked, “what is the equivalent in R of the back of the stats textbook table of probabilities and their corresponding Z-scores?” (This is an example of the kind of table the student was talking about.) This question indicated to me that although we’ve been asked to use some of the distribution functions in past homeworks, there may be some misunderstanding about how these functions work.

Right now I’m going to focus on the functions for the normal distribution, but you can find a list of all distribution functions by typing help(Distributions) into your R console.


dnorm

As we all know the probability density for the normal distribution is:

f(x|μ,σ)=1σ2π−−√e−(x−μ)22σ2f(x|μ,σ)=1σ2πe−(x−μ)22σ2

The function dnorm returns the value of the probability density function for the normal distribution given parameters for xx, μμ, and σσ. Some examples of using dnorm are below:

# This is a comment. Anything I write after the octothorpe is not executed.

# This is the same as computing the pdf of the normal with x = 0, mu = 0 and
# sigma = 0. The dnorm function takes three main arguments, as do all of the
# *norm functions in R. dnorm(0, mean = 0, sd = 1)
## [1] 0.3989423
# The line of code below does the same thing as the same as the line of code
# above, since mean = 0 and sd = 0 are the default arguments for the dnorm
# function. dnorm(0)
## [1] 0.3989423
# Another exmaple of dnorm where parameters have been changed.

dnorm(2, mean = 5, sd = 3)
## [1] 0.08065691

Although xx represents the independent variable of the pdf for the normal distribution, it’s also useful to think of xx as a Z-score. Let me show you what I mean by graphing the pdf of the normal distribution with dnorm.

# First I'll make a vector of Z-scores
z_scores <- seq(-3, 3, by = .1) # Let's print the vector
z_scores
##  [1] -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7
## [15] -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3
## [29] -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1
## [43] 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5
## [57] 2.6 2.7 2.8 2.9 3.0
# Let's make a vector of the values the function takes given those Z-scores.
# Remember for dnorm the default value for mean is 0 and for sd is 1.
dvalues <- dnorm(z_scores) # Let's examine those values
dvalues
##  [1] 0.004431848 0.005952532 0.007915452 0.010420935 0.013582969
## [6] 0.017528300 0.022394530 0.028327038 0.035474593 0.043983596
## [11] 0.053990967 0.065615815 0.078950158 0.094049077 0.110920835
## [16] 0.129517596 0.149727466 0.171368592 0.194186055 0.217852177
## [21] 0.241970725 0.266085250 0.289691553 0.312253933 0.333224603
## [26] 0.352065327 0.368270140 0.381387815 0.391042694 0.396952547
## [31] 0.398942280 0.396952547 0.391042694 0.381387815 0.368270140
## [36] 0.352065327 0.333224603 0.312253933 0.289691553 0.266085250
## [41] 0.241970725 0.217852177 0.194186055 0.171368592 0.149727466
## [46] 0.129517596 0.110920835 0.094049077 0.078950158 0.065615815
## [51] 0.053990967 0.043983596 0.035474593 0.028327038 0.022394530
## [56] 0.017528300 0.013582969 0.010420935 0.007915452 0.005952532
## [61] 0.004431848
# Now we'll plot these values
plot(dvalues, # Plot where y = values and x = index of the value in the vector
xaxt = "n", # Don't label the x-axis
type = "l", # Make it a line plot
main = "pdf of the Standard Normal",
xlab= "Z-score") # These commands label the x-axis
axis(1, at=which(dvalues == dnorm(0)), labels=c(0))
axis(1, at=which(dvalues == dnorm(1)), labels=c(-1, 1))
axis(1, at=which(dvalues == dnorm(2)), labels=c(-2, 2))

As you can see, dnorm will give us the “height” of the pdf of the normal distribution at whatever Z-score we provide as an argument to dnorm.


pnorm

The function pnorm returns the integral from −∞−∞ to qq of the pdf of the normal distribution where qq is a Z-score. Try to guess the value of pnorm(0). (pnorm has the same default mean and sd arguments as dnorm).

# To be clear about the arguments in this example:
# q = 0, mean = 0, sd = 1
pnorm(0)
## [1] 0.5

The pnorm function also takes the argument lower.tail. If lower.tail is set equal to FALSE then pnorm returns the integral from qq to ∞∞ of the pdf of the normal distribution. Note that pnorm(q) is the same as 1-pnorm(q, lower.tail = FALSE)

pnorm(2)
## [1] 0.9772499
pnorm(2, mean = 5, sd = 3)
## [1] 0.1586553
pnorm(2, mean = 5, sd = 3, lower.tail = FALSE)
## [1] 0.8413447
1 - pnorm(2, mean = 5, sd = 3, lower.tail = FALSE)
## [1] 0.1586553

pnorm is the function that replaces the table of probabilites and Z-scores at the back of the statistics textbook. Let’s take our vector of Z-scores from before (z_scores) and compute a new vector of “probability masses” using pnorm. Any guesses about what this plot will look like?

pvalues <- pnorm(z_scores)

# Now we'll plot these values
plot(pvalues, # Plot where y = values and x = index of the value in the vector
xaxt = "n", # Don't label the x-axis
type = "l", # Make it a line plot
main = "cdf of the Standard Normal",
xlab= "Quantiles",
ylab="Probability Density") # These commands label the x-axis
axis(1, at=which(pvalues == pnorm(-2)), labels=round(pnorm(-2), 2))
axis(1, at=which(pvalues == pnorm(-1)), labels=round(pnorm(-1), 2))
axis(1, at=which(pvalues == pnorm(0)), labels=c(.5))
axis(1, at=which(pvalues == pnorm(1)), labels=round(pnorm(1), 2))
axis(1, at=which(pvalues == pnorm(2)), labels=round(pnorm(2), 2))

It’s the plot of the cumulative distribution function of the normal distribution! Isn’t that neat?


qnorm

The qnorm function is simply the inverse of the cdf, which you can also think of as the inverse of pnorm! You can use qnorm to determine the answer to the question: What is the Z-score of the pthpth quantile of the normal distribution?

# What is the Z-score of the 50th quantile of the normal distribution?
qnorm(.5)
## [1] 0
# What is the Z-score of the 96th quantile of the normal distribution?
qnorm(.96)
## [1] 1.750686
# What is the Z-score of the 99th quantile of the normal distribution?
qnorm(.99)
## [1] 2.326348
# They're truly inverses!
pnorm(qnorm(0))
## [1] 0
qnorm(pnorm(0))
## [1] 0

Let’s plot qnorm and pnorm next to each other to further illustrate the fact they they are inverses.

# This is for getting two graphs next to each other
oldpar <- par()
par(mfrow=c(1,2)) # Let's make a vector of quantiles: from 0 to 1 by increments of .05
quantiles <- seq(0, 1, by = .05)
quantiles
##  [1] 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65
## [15] 0.70 0.75 0.80 0.85 0.90 0.95 1.00
# Now we'll find the Z-score at each quantile
qvalues <- qnorm(quantiles)
qvalues
##  [1]       -Inf -1.6448536 -1.2815516 -1.0364334 -0.8416212 -0.6744898
## [7] -0.5244005 -0.3853205 -0.2533471 -0.1256613 0.0000000 0.1256613
## [13] 0.2533471 0.3853205 0.5244005 0.6744898 0.8416212 1.0364334
## [19] 1.2815516 1.6448536 Inf
# Plot the z_scores
plot(qvalues,
type = "l", # We want a line graph
xaxt = "n", # No x-axis
xlab="Probability Density",
ylab="Z-scores") # Same pnorm plot from before
plot(pvalues, # Plot where y = values and x = index of the value in the vector
xaxt = "n", # Don't label the x-axis
type = "l", # Make it a line plot
main = "cdf of the Standard Normal",
xlab= "Quantiles",
ylab="Probability Density") # These commands label the x-axis
axis(1, at=which(pvalues == pnorm(-2)), labels=round(pnorm(-2), 2))
axis(1, at=which(pvalues == pnorm(-1)), labels=round(pnorm(-1), 2))
axis(1, at=which(pvalues == pnorm(0)), labels=c(.5))
axis(1, at=which(pvalues == pnorm(1)), labels=round(pnorm(1), 2))
axis(1, at=which(pvalues == pnorm(2)), labels=round(pnorm(2), 2))

# Restore old plotting settings
par(oldpar)

rnorm

If you want to generate a vector of normally distributed random numbers, rnorm is the function you should use. The first argument n is the number of numbers you want to generate, followed by the standard mean and sd arguments. Let’s illustrate the weak law of large numbers using rnorm.

# set.seed is a function that takes a number as an argument and sets a seed from
# which random numbers are generated. It's important to set a seed so that your
# code is reproduceable. If you wanted to you could always set your seed to the
# same number. I like to set seeds to the "date" which is really just
# the arithmetic equation "month minus day minus year". So today's seed
# is -2006.
set.seed(10-1-2015)
rnorm(5)
## [1] -0.7197035 -1.4442137 -1.0120381  1.4577066 -0.1212466
# If I set the seed to the same seed again, I'll generate the same vector of
# numbers.
set.seed(10-1-2015)
rnorm(5)
## [1] -0.7197035 -1.4442137 -1.0120381  1.4577066 -0.1212466
# Now onto using rnorm

# Let's generate three different vectors of random numbers from a normal
# distribution
n10 <- rnorm(10, mean = 70, sd = 5)
n100 <- rnorm(100, mean = 70, sd = 5)
n10000 <- rnorm(10000, mean = 70, sd = 5) # Let's just look at one of the vectors
n10
##  [1] 54.70832 72.89000 70.27049 69.16508 72.97937 67.91004 67.77183
## [8] 72.29231 74.33411 63.57151

Which historgram do you think will be most centered around the true mean of 70?

# This is for getting two graphs next to each other
oldpar <- par()
par(mfrow=c(1,3)) # The breaks argument specifies how many bars are in the histogram
hist(n10, breaks = 5)
hist(n100, breaks = 20)
hist(n10000, breaks = 100)

# Restore old plotting settings
par(oldpar)

Closing thoughts

These concepts generally hold true for all the distribution functions built into R. You can learn more about all of the distribution functions by typing help(Distributions) into the R console. If you have any questions about this demonstration or about R programming please send me an email. If you’d like to change or contribute to this document I welcome pull requests on GitHub. This document and all code contained within is licensed CC0.

Introduction to dnorm, pnorm, qnorm, and rnorm for new biostatisticians的更多相关文章

  1. 关于R语言中dnorm,pnorm,qnorm,rnorm的用法

    dnorm,pnorm,qnorm,rnorm的表达式: 其中x和q是由数值型变量构成的向量,p是由概率构成的向量,n是随机产生的个数 mean是要计算正态分布的均值,缺省值为0,sd是计算正态分布的 ...

  2. (main)贝叶斯统计 | 贝叶斯定理 | 贝叶斯推断 | 贝叶斯线性回归 | Bayes' Theorem

    2019年08月31日更新 看了一篇发在NM上的文章才又明白了贝叶斯方法的重要性和普适性,结合目前最火的DL,会有意想不到的结果. 目前一些最直觉性的理解: 概率的核心就是可能性空间一定,三体世界不会 ...

  3. R概率分布函数使用小结

    记要 今天在计算分类模型自行区间时,用到了R中正太分布的qnorm函数,这里做简单记要,作为备忘. R中自带了很多概率分布的函数,如正太分布,二次分布,卡放分布,t分布等,这些分布的函数都有一个共性, ...

  4. R语言常用命令集合

    help.start()//打开帮助文档 q()//推出函数 ls()//返回处于现在名空间的对象名称 rm()//清楚对象:rm(list=ls())清除所有内存数据 gc()//垃圾回收数据 sq ...

  5. R中矩阵运算

    # 数据产生 # rnorm(n, mean = 0, sd = 1) 正态分布的随机数(r 代表随机,可以替换成dnorm, pnorm, qnorm 作不同计算.r= random = 随机, d ...

  6. R1-5天

    R语言笔记文档 2019.11.24 R语言的安装 工作目录查看与更改 变量的三种赋值 如何查看R语言帮助 ? args 基础数据类型 基本数据类型 因子.数据框.数组.矩阵.列表.向量 2019.1 ...

  7. R语言:常用函数【转】

    数据结构 一.数据管理vector:向量 numeric:数值型向量 logical:逻辑型向量 character:字符型向量list:列表 data.frame:数据框 c:连接为向量或列表len ...

  8. 简单介绍一下R中的几种统计分布及常用模型

    统计学上分布有很多,在R中基本都有描述.因能力有限,我们就挑选几个常用的.比较重要的简单介绍一下每种分布的定义,公式,以及在R中的展示. 统计分布每一种分布有四个函数:d――density(密度函数) ...

  9. R9—R常用函数分类汇总

    数据结构 一.数据管理 vector:向量 numeric:数值型向量 logical:逻辑型向量 character:字符型向量 list:列表 data.frame:数据框 c:连接为向量或列表 ...

随机推荐

  1. 启动Activiti项目报错:org.activiti.engine.ActivitiObjectNotFoundException: no deployed process definition found with id '22501'

    背景 启动activiti项目时,出现错误org.activiti.engine.ActivitiObjectNotFoundException: no deployed process defini ...

  2. uniGUI试用笔记(二)

    前几天做的demo今天启动后,浏览器打开页面后死活不显示窗体,找了半天原因才发现是360浏览器启动了兼容模式,改成极速模式后就正常了.有点晕.... 今天简单测试了TUniGUIServerModul ...

  3. Bootstrap3基础 pagination 分页按钮 简单示例

      内容 参数   OS   Windows 10 x64   browser   Firefox 65.0.2   framework     Bootstrap 3.3.7   editor    ...

  4. tp剩余未验证内容-2

    如何设置一个 "资源" (文件/图片/zip/视频等)在点击时, 自动开始下载? 通常只要在这些地方, 设置 一个链接a, 让href等于这个资源就行了. 这样当点击这个资源时, ...

  5. 从0开始安装fedora23的笔记-- 以及使用fedora的常规问题-2

    在shell中, 你是可以连续输入多个语句的, 中间用分号; 连接 也可以把这些多个语句放到一个函数中, 函数的话,便于多次引用. 而且 "封装" 为函数后, 可以用set查看到这 ...

  6. P4592 [TJOI2018]异或

    吐槽 睡起来写道模板清醒一下 貌似有两个log的树剖写法,还有一个log的Trie树的差分做法(类似于count on a tree),然后一个log的要把两个询问分开写,一个dfs序一个差分,然后我 ...

  7. shiro 前后端分离 seseeionId 问题

    http://www.cnblogs.com/cshhs/p/9269411.html https://www.w3cschool.cn/shiro/rmvk1if1.html http://www. ...

  8. Jenkins-job迁移

    摘自:http://www.cnblogs.com/topplay/p/3899330.html Jenkins迁移job 说明:从一个Jenkins服务器A将现有job迁移到另外一个Jenkins服 ...

  9. 使用 jenkins 搭建CI/CD流水线 (MAC)

    如何搭建持续集成/持续交付平台?? 如何使用jenkins搭建持续交付流水线,以及和其他工具(如artifactory)集成?如何使用元数据,记录软件发布过程的构建信息,测试结果,并用rest Api ...

  10. Codeforces 510 E. Fox And Dinner

    题目链接:http://codeforces.com/problemset/problem/510/E 乍一看和那啥魔术球问题有点神似啊/XD 其实是不一样的. 解决这道问题的关键在于发现若是相邻的两 ...