[Math Review] Statistics Basic: Estimation
Two Types of Estimation
- Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)
- Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)
Confidence Interval for the Mean
Population Variance is known
Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean instead.
Population Variance is Unknown
Dregree of Freedom
The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question.
If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.
Student's t-Distribution
Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.
The probability density function of T is

where is the degree of freedom,
is a gamma function.
The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase.
Confidence Interval of t Distribution
Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).
Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.
[Math Review] Statistics Basic: Estimation的更多相关文章
- [Math Review] Statistics Basic: Sampling Distribution
Inferential Statistics Generalizing from a sample to a population that involves determining how far ...
- [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing
Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...
- [Math Review] Linear Algebra for Singular Value Decomposition (SVD)
Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...
- 统计处理包Statsmodels: statistics in python
http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...
- FAQ: Automatic Statistics Collection (文档 ID 1233203.1)
In this Document Purpose Questions and Answers What kind of statistics do the Automated tasks ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 本人AI知识体系导航 - AI menu
Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯 徐亦达老板 Dirichlet Process 学习 ...
- [book]awesome-machine-learning books
https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...
随机推荐
- 《Cracking the Coding Interview》——第3章:栈和队列——题目4
2014-03-18 05:28 题目:你肯定听过汉诺威塔的故事:三个柱子和N个从小到大的盘子.既然每次你只能移动放在顶上的盘子,这不就是栈操作吗?所以,请用三个栈来模拟N级汉诺威塔的玩法.放心,N不 ...
- tomcat运行solr
https://blog.csdn.net/u010346953/article/details/67640036
- Python3基本语法
#编码 ''' 默认情况下,Python 3 源码文件以 UTF-8 编码,所有字符串都是 unicode 字符串. 当然你也可以为源码文件指定不同的编码: # -*- coding: cp-1252 ...
- HDU 4116 Fruit Ninja ( 计算几何 + 扫描线 )
给你最多1000个圆,问画一条直线最多能与几个圆相交,相切也算. 显然临界条件是这条线是某两圆的公切线,最容易想到的就是每两两圆求出所有公切线,暴力判断一下. 可惜圆有1000个,时间复杂度太高. 网 ...
- Struts2+DAO层实现实例01——搭建Struts2基本框架
实例内容 利用Strust2实现一个登陆+注册功能的登陆系统. 实现基础流程:
- 【转】通过制作Flappy Bird了解Native 2D中的RigidBody2D和Collider
作者:王选易,出处:http://www.cnblogs.com/neverdie/ 欢迎转载,也请保留这段声明.如果你喜欢这篇文章,请点[推荐].谢谢! 引子 在第一篇文章[Unity3D基础教程] ...
- 华东交通大学2017年ACM双基程序设计大赛题解
简单题 Time Limit : 3000/1000ms (Java/Other) Memory Limit : 65535/32768K (Java/Other) Total Submissio ...
- 7月13号day5总结
今天学习过程和小结 使用伪分布式进行大数据计算,计算气象站记录气温的平均值 weather map()方法,key值数据多所以用LongWritable,value值是string类型,string类 ...
- MyBatis原理分析
MyBatis原理分析 参考博客: 深入理解mybatis原理: http://blog.csdn.net/luanlouis/article/details/40422941 一 . JDBC的 ...
- 获得touch事件,jquery绑定事件监听器,ios设备上打开touch状态以响应focus,active等伪类
2. 默认的监听方式 document.addEventListener('touchstart', function(){ alert('hello'); }, false); 使用jquery时 ...