[Math Review] Statistics Basic: Estimation
Two Types of Estimation
- Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)
- Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)
Confidence Interval for the Mean
Population Variance is known
Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean instead.
Population Variance is Unknown
Dregree of Freedom
The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question.
If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.
Student's t-Distribution
Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.
The probability density function of T is

where is the degree of freedom,
is a gamma function.
The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase.
Confidence Interval of t Distribution
Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).
Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.
[Math Review] Statistics Basic: Estimation的更多相关文章
- [Math Review] Statistics Basic: Sampling Distribution
Inferential Statistics Generalizing from a sample to a population that involves determining how far ...
- [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing
Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...
- [Math Review] Linear Algebra for Singular Value Decomposition (SVD)
Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...
- 统计处理包Statsmodels: statistics in python
http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...
- FAQ: Automatic Statistics Collection (文档 ID 1233203.1)
In this Document Purpose Questions and Answers What kind of statistics do the Automated tasks ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 本人AI知识体系导航 - AI menu
Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯 徐亦达老板 Dirichlet Process 学习 ...
- [book]awesome-machine-learning books
https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...
随机推荐
- USACO Section2.3 Zero Sum 解题报告 【icedream61】
zerosum解题报告----------------------------------------------------------------------------------------- ...
- c语言字符串内存分配小记
一.疑问 有这样一道题: #include "stdio.h" int main() { ]; ]; scanf("%s", word1); scanf(&qu ...
- [转]赵桐正thinkphp教程笔记
原文:赵桐正thinkphp教程笔记 ,有修改 常用配置 常用配置config.php: <?php return array( //'配置项'=>'配置值' 'URL_PATHINFO_ ...
- 【tmux环境配置】在centos6.4上配置tmux
我学习tmux的动力如下: (1)tmux大法好.原因是被同学安利过tmux. (2)多个terminal下ssh到开发机太麻烦.还是之前实习的时候,总要开N个terminal去ssh开发机,这种东西 ...
- Windows Server 2008 R2 集群(OpenService “RemoteRegistry” 失败)笔记
OpenService “RemoteRegistry” 失败. 我在创建验证域控服务器[系统]类别中 看到错误日志 我在域控服务器去看,在 computers 里面 是有这台 计算机,但是为什么不行 ...
- 【NOIP 2017 提高组】列队
题目 有一个 \(n\times m\) 的方阵,每次出来一个人后向左看齐,向前看齐,询问每次出来的人的编号. \(n\le 3\times 10^5\) 分析 我们考虑离队本质上只有两种操作: 删除 ...
- Python学习-django-ModelForm组件
ModelForm a. class Meta: model, # 对应Model的 fields=None, # 字段 exclude=None, # 排除字段 labels=None, # 提示信 ...
- Android Service完全解析
Service的基本用法 1.新建一个Android项目,新建一个MyService继承自Service,并重写父类的onCreate(),onStartCommand()方法和onDestory() ...
- Linux 网卡特性配置ethtool详解
近期遇到一个自定义报文传输性能问题,解决过程中借助了ethtool这个工具,因此发掘一下与此工具相关的网卡的一些特性. ethtool 常用命令如下,比如对eth0的操作: ethtool eth0 ...
- oracle定时job粗解
其中一篇随笔我写了oracle的存储过程大概的介绍,存储过程除了自身有in的param,来进行程序调用处理之外,还可以通过定时任务的方式调用来执行. 应用场景: 数据同步:有两个显示菜单,“信息编辑” ...