Two Types of Estimation

One of the major applications of statistics is estimating population parameters from sample statistics. There are types of estimation:
  • Point Estimate: the value of sample statistics

Point estimates of average height with multiple samples (Source: Zhihu)

  • Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.

95% confidence interval of average height with multiple samples (Source: Zhihu)

Confidence Interval for the Mean

Population Variance is known

Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.

According to Central Limit Theorem, the the sampling distribution of the mean M is

where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.

That is,

Since we don't know the mean of population, we could use the sample mean  instead.

Population Variance is Unknown

Dregree of Freedom

The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question. 

If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as

That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula

where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.

Student's t-Distribution

Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)

is sample mean and

is sample deviation.

is a random variable of normal distribution.

is a random variable of student's t distribution.

The probability density function of T is

where  is the degree of freedom,  is a gamma function.

The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.

The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase. 

Confidence Interval of t Distribution

Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).

Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as

where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.

[Math Review] Statistics Basic: Estimation的更多相关文章

  1. [Math Review] Statistics Basic: Sampling Distribution

    Inferential Statistics Generalizing from a sample to a population that involves determining how far ...

  2. [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing

    Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...

  3. [Math Review] Linear Algebra for Singular Value Decomposition (SVD)

    Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...

  4. 统计处理包Statsmodels: statistics in python

    http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...

  5. FAQ: Automatic Statistics Collection (文档 ID 1233203.1)

    In this Document   Purpose   Questions and Answers   What kind of statistics do the Automated tasks ...

  6. Machine and Deep Learning with Python

    Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...

  7. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  8. 本人AI知识体系导航 - AI menu

    Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯   徐亦达老板 Dirichlet Process 学习 ...

  9. [book]awesome-machine-learning books

    https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...

随机推荐

  1. 《Cracking the Coding Interview》——第9章:递归和动态规划——题目9

    2014-03-20 04:08 题目:八皇后问题. 解法:DFS解决. 代码: // 9.9 Eight-Queen Problem, need I say more? #include <c ...

  2. 《Cracking the Coding Interview》——第8章:面向对象设计——题目10

    2014-04-24 00:05 题目:用拉链法设计一个哈希表. 解法:一个简单的哈希表,就看成一个数组就好了,每个元素是一个桶,用来放入元素.当有多个元素落入同一个桶的时候,就用链表把它们连起来.由 ...

  3. 一个初学者的辛酸路程-基于Django写BBS项目

    前言 基于Django的学习 详情 登录界面 找个模板 http://v3.bootcss.com/examples/signin/ 右键,检查源码     函数 def login(request) ...

  4. 孤荷凌寒自学python第六天 列表的嵌套与列表的主要方法

    孤荷凌寒自学python第六天 列表的嵌套与列表的主要方法 (完整学习过程屏幕记录视频地址在文末,手写笔记在文末) (同步的语音笔记朗读:https://www.ximalaya.com/keji/1 ...

  5. 【PTA】Tree Traversals Again

    题目如下: An inorder binary tree traversal can be implemented in a non-recursive way with a stack. For e ...

  6. Linux静态ip设置及一些网络设置

    网络服务配置文件 /etc/sysconfig/network 网络接口配置文件 /etc/sysconfig/network-scripts/ifcfg-INTERFACE_NAME 修改IP永久生 ...

  7. HDU 4665 Unshuffle DFS找一个可行解

    每层找一对相等的整数,分别放在两个不同的串中. 参考了学弟的解法,果断觉得自己老了…… #include <cstdio> #include <cstring> #includ ...

  8. log4j配置打印mybatis的sql到控制台(复制)

    log4j.rootLogger=DEBUG, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender ...

  9. Servlet 返回Json数据格式

    其实就是把数据库中的数据查询出来拼接成一个Json数据 import dao.UserDao; import endy.User; import javax.servlet.ServletExcept ...

  10. atom-安装插件

    1. 安装git. 2. 安装node环境,其中集成了npm. 3. 启动git 键入命令: cd User/[yourname]/.atom/packages 进入packages目录. 4. 下载 ...