Two Types of Estimation

One of the major applications of statistics is estimating population parameters from sample statistics. There are types of estimation:
  • Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)

  • Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)

Confidence Interval for the Mean

Population Variance is known

Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean  instead.

Population Variance is Unknown

Dregree of Freedom

The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. 

If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.

Student's t-Distribution

Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.

The probability density function of T is

where  is the degree of freedom,  is a gamma function.

The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase. 

Confidence Interval of t Distribution

Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).

Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.

[Math Review] Statistics Basic: Estimation的更多相关文章

  1. [Math Review] Statistics Basic: Sampling Distribution

    Inferential Statistics Generalizing from a sample to a population that involves determining how far ...

  2. [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing

    Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...

  3. [Math Review] Linear Algebra for Singular Value Decomposition (SVD)

    Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...

  4. 统计处理包Statsmodels: statistics in python

    http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...

  5. FAQ: Automatic Statistics Collection (文档 ID 1233203.1)

    In this Document   Purpose   Questions and Answers   What kind of statistics do the Automated tasks ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  8. 本人AI知识体系导航 - AI menu

    Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯   徐亦达老板 Dirichlet Process 学习 ...

  9. [book]awesome-machine-learning books

    https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...

随机推荐

  1. eclipse进阶功法

    先选择要操作的行,在同时按shift+alt+a,会将所选中的文字括起来,鼠标会变成十字图标,按住鼠标左键,在相应输入文字的位置上下拖动,会出现一个竖杠,此时即可开始输入文字了,并且所选中行都有.

  2. Vue2 全局过滤器(vue-cli)

    先看官方简介: 当前组件注册: export default { data () { return {} }, filters:{ orderBy (){ // doSomething }, uppe ...

  3. 系统编程--高级IO

    1.非阻塞I/O 非阻塞I/O使我们可以调用不会永远阻塞的I/O操作,例如open,read和write.如果这种操作不能完成,则立即出错返回,表示该操作如继续执行将继续阻塞下去.对于一个给定的描述符 ...

  4. Asp.net WebApi添加帮助文档

    一.创建一个空的WebApi站点 二.新增一个名为Test的API控制器,实现部分方法(方法和类要添加文档说明注释) 1. 添加一个用户数据模型UserInfo.cs,代码如下: /// <su ...

  5. airTest 实战之 -- 【征途】自动打怪回城卖物品

    airTest是一个跨平台的.基于图像识别的UI自动化测试框架,适用于游戏和App,支持平台有Windows.Android和iOS 官方文档: http://airtest.netease.com/ ...

  6. Nginx简单的配置详情

    大致了解Nginx后,直接从配置文件入手: [shell] #定义Nginx运行的用户和用户组 user nginx; #nginx进程数,建议设置为等于CPU总核心数. worker_process ...

  7. nagios原理及配置详解

    1.Nagios如何监控Linux机器 NRPE总共由两部分组成:(1).check_nrpe插件,运行在监控主机上.服务器端安装详见:(2).NRPE daemon,运行在远程的linux主机上(通 ...

  8. Redis 与 Spring 集成

    配置applicationContext.xml <!-- 连接池配置 --> <bean id="jedisPoolConfig" class="re ...

  9. Struts1 多个配置文件的实现

    在Struts 1.0中,我们只能在web.xml中为ActionServlet指定一个配置文件,这对于我们这些网上的教学例子来说当然没什么问题,但是在实际的应用开发过程中,可能会有些麻烦.因为许多开 ...

  10. iOS大神班笔记02-模仿苹果创建单例

    首先我们得要知道苹果是如何实现单例的:1.不能外界调用alloc,一调用就崩掉,其实就是抛异常(类内部第一次调用alloc就不崩溃,其他都崩溃). 2.提供一个方法给外界获取单例.  3.内部创建一次 ...