[Math Review] Statistics Basic: Estimation
Two Types of Estimation
- Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)
- Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)
Confidence Interval for the Mean
Population Variance is known
Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean instead.
Population Variance is Unknown
Dregree of Freedom
The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question.
If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.
Student's t-Distribution
Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.
The probability density function of T is

where is the degree of freedom,
is a gamma function.
The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase.
Confidence Interval of t Distribution
Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).
Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.
[Math Review] Statistics Basic: Estimation的更多相关文章
- [Math Review] Statistics Basic: Sampling Distribution
Inferential Statistics Generalizing from a sample to a population that involves determining how far ...
- [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing
Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...
- [Math Review] Linear Algebra for Singular Value Decomposition (SVD)
Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...
- 统计处理包Statsmodels: statistics in python
http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...
- FAQ: Automatic Statistics Collection (文档 ID 1233203.1)
In this Document Purpose Questions and Answers What kind of statistics do the Automated tasks ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 本人AI知识体系导航 - AI menu
Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯 徐亦达老板 Dirichlet Process 学习 ...
- [book]awesome-machine-learning books
https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...
随机推荐
- 【Luogu P2257】YY 的 GCD
题目 求: \[ \sum_{i = 1}^n \sum_{j = 1}^m [\gcd(i, j) \in \mathbb P] \] 有 \(T\) 组数据, \(T\le 10^4, n, m\ ...
- soapUI的简单使用(webservice接口功能测试)
1.soapUI支持什么样的测试? 功能测试.性能测试.负载.回归测试等,它不仅仅可以测试基于 SOAP 的 Web 服务,也可以测试 REST 风格的 Web 服务. 1.SoapUI安装注意事项 ...
- Python导出sql语句结果到Excel
本文档是因为每周需要统计线上数据库中客户新增资源,手动执行实在是麻烦,就写了个脚本导出到Excel,顺便发一封邮件. (当然这不是线上的真实脚本,不过根据个人需求稍微修改下,还是可以直接用的.拿去不谢 ...
- 孤荷凌寒自学python第二十九天python的datetime.time模块
孤荷凌寒自学python第二十九天python的datetime.time模块 (完整学习过程屏幕记录视频地址在文末,手写笔记在文末) datetime.time模块是专门用来表示纯时间部分的类. ...
- 【志银】NYOJ《题目529》flip
题目:flip 题目链接:http://acm.nyist.net/JudgeOnline/problem.php?pid=529 吐槽Time: 由于此题槽点太多,所以没忍住... 看到这题通过率出 ...
- 聊聊、Spring WebApplicationInitializer
说到 WebApplicationInitializer,这个接口是为了实现代码配置 Web 功能.只要实现了这个接口,那么就可以实现 Filter,Servlet,Listener 等配置,跟在 x ...
- [转] Linux命令行编辑常用键
ctrl + a 将光标移动到命令行开头相当于VIM里shift+^ ctrl + e 将光标移动到命令行结尾处相当于VIM里shift+$ ctrl + 方向键左键 光标移动到前一个单词开头 ctr ...
- iPhone:iOS界面,本地生成随机验证码
本文博客,模仿杰瑞教育的一篇博文,并在它的基础上,进行了些许更改.同时在重写的过程中,对自己忽略的地方,进行了重新认识,受益匪浅.文章来源:http://www.cnblogs.com/jerehed ...
- ci重写 配置文件
server { listen 80; #listen [::]:80; server_name wangyongshun.xyz www.wangyongshun.xyz; index index. ...
- 【bzoj5017】[Snoi2017]炸弹 线段树优化建图+Tarjan+拓扑排序
题目描述 在一条直线上有 N 个炸弹,每个炸弹的坐标是 Xi,爆炸半径是 Ri,当一个炸弹爆炸时,如果另一个炸弹所在位置 Xj 满足: Xi−Ri≤Xj≤Xi+Ri,那么,该炸弹也会被引爆. 现在 ...