[Math Review] Statistics Basic: Estimation
Two Types of Estimation
- Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)
- Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)
Confidence Interval for the Mean
Population Variance is known
Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean instead.
Population Variance is Unknown
Dregree of Freedom
The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question.
If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.
Student's t-Distribution
Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.
The probability density function of T is

where is the degree of freedom,
is a gamma function.
The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase.
Confidence Interval of t Distribution
Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).
Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.
[Math Review] Statistics Basic: Estimation的更多相关文章
- [Math Review] Statistics Basic: Sampling Distribution
Inferential Statistics Generalizing from a sample to a population that involves determining how far ...
- [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing
Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...
- [Math Review] Linear Algebra for Singular Value Decomposition (SVD)
Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...
- 统计处理包Statsmodels: statistics in python
http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...
- FAQ: Automatic Statistics Collection (文档 ID 1233203.1)
In this Document Purpose Questions and Answers What kind of statistics do the Automated tasks ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 本人AI知识体系导航 - AI menu
Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯 徐亦达老板 Dirichlet Process 学习 ...
- [book]awesome-machine-learning books
https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...
随机推荐
- eclipse进阶功法
先选择要操作的行,在同时按shift+alt+a,会将所选中的文字括起来,鼠标会变成十字图标,按住鼠标左键,在相应输入文字的位置上下拖动,会出现一个竖杠,此时即可开始输入文字了,并且所选中行都有.
- Vue2 全局过滤器(vue-cli)
先看官方简介: 当前组件注册: export default { data () { return {} }, filters:{ orderBy (){ // doSomething }, uppe ...
- 系统编程--高级IO
1.非阻塞I/O 非阻塞I/O使我们可以调用不会永远阻塞的I/O操作,例如open,read和write.如果这种操作不能完成,则立即出错返回,表示该操作如继续执行将继续阻塞下去.对于一个给定的描述符 ...
- Asp.net WebApi添加帮助文档
一.创建一个空的WebApi站点 二.新增一个名为Test的API控制器,实现部分方法(方法和类要添加文档说明注释) 1. 添加一个用户数据模型UserInfo.cs,代码如下: /// <su ...
- airTest 实战之 -- 【征途】自动打怪回城卖物品
airTest是一个跨平台的.基于图像识别的UI自动化测试框架,适用于游戏和App,支持平台有Windows.Android和iOS 官方文档: http://airtest.netease.com/ ...
- Nginx简单的配置详情
大致了解Nginx后,直接从配置文件入手: [shell] #定义Nginx运行的用户和用户组 user nginx; #nginx进程数,建议设置为等于CPU总核心数. worker_process ...
- nagios原理及配置详解
1.Nagios如何监控Linux机器 NRPE总共由两部分组成:(1).check_nrpe插件,运行在监控主机上.服务器端安装详见:(2).NRPE daemon,运行在远程的linux主机上(通 ...
- Redis 与 Spring 集成
配置applicationContext.xml <!-- 连接池配置 --> <bean id="jedisPoolConfig" class="re ...
- Struts1 多个配置文件的实现
在Struts 1.0中,我们只能在web.xml中为ActionServlet指定一个配置文件,这对于我们这些网上的教学例子来说当然没什么问题,但是在实际的应用开发过程中,可能会有些麻烦.因为许多开 ...
- iOS大神班笔记02-模仿苹果创建单例
首先我们得要知道苹果是如何实现单例的:1.不能外界调用alloc,一调用就崩掉,其实就是抛异常(类内部第一次调用alloc就不崩溃,其他都崩溃). 2.提供一个方法给外界获取单例. 3.内部创建一次 ...