[Math Review] Statistics Basic: Estimation
Two Types of Estimation
- Point Estimate: the value of sample statistics
Point estimates of average height with multiple samples (Source: Zhihu)
- Confidence Intervals: intervals constructed using a method that contains the population parameter a specified proportion of the time.
95% confidence interval of average height with multiple samples (Source: Zhihu)
Confidence Interval for the Mean
Population Variance is known
Suppose that M is the mean of N samples X1, X2, ......, Xn, i.e.
According to Central Limit Theorem, the the sampling distribution of the mean M is
where μ and σ2 are the mean and variance of the population respectively. If repeated samples were taken and the 95% confidence interval computed for each sample, 95% of the intervals would contain the population mean. So the 95% confidence interval for M is the inverval that is symetric about the point estimate μ so that the area under normal distribution is 0.95.
That is,
Since we don't know the mean of population, we could use the sample mean instead.
Population Variance is Unknown
Dregree of Freedom
The degrees of freedom (df) of an estimate is the number of independent pieces of information on which the estimate is based. In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question.
If the variance in a sample is used to estimate the variance in a population, we couldn't calculate the sample variace as
That's because we have two parameters to estimate (i.e., sample mean and sample variance). The degree of freedom should be N-1, so the previous formula underestimates the variance. Instead, we should use the following formula
where s2 is the estimate of the variance and M is the sample mean. The denominator of this formula is the degree of freedom.
Student's t-Distribution
Suppose that X is a random variable of normal distribution, i.e., X ~ N(μ, σ2)
is sample mean and
is sample deviation.
is a random variable of normal distribution.
is a random variable of student's t distribution.
The probability density function of T is
where is the degree of freedom,
is a gamma function.
The t distribution is very similar to the normal distribution when the estimate of variance is based on many degrees of freedom, but has relatively more scores in its tails when there are fewer degrees of freedom. Here are t distributions with 2, 4, and 10 degrees of freedom and the standard normal distribution. Notice that the normal distribution has relatively more scores in the center of the distribution and the t distribution has relatively more in the tails.
The t distribution is therefore leptokurtic. The t distribution approaches the normal distribution as the degrees of freedom increase.
Confidence Interval of t Distribution
Now consider the case in which you have a normal distribution but you do not know the standard deviation. You sample N values and compute the sample mean (M) and estimate the standard error of the mean (σM) with sM. What is the probability that M will be within 1.96 sM of the population mean (μ)? This is a difficult problem because there are two ways in which M could be more than 1.96 sM from μ: (1) M could, by chance, be either very high or very low and (2) sM could, by chance, be very low. Intuitively, it makes sense that the probability of being within 1.96 standard errors of the mean should be smaller than in the case when the standard deviation is known (and cannot be underestimated).
Luckily, however, we can prove that random variable T will be student's t distribution. So we can use t distribution to estimate the mean of a normal distribution population in situations where the sample size is small and population standard deviation is unknown. For 90% confidence interval, it can be calculated as
where A is value of T that contains 90% of the area of the t distribution for n-1 degree of freedom. We can calculate A through the t table.
[Math Review] Statistics Basic: Estimation的更多相关文章
- [Math Review] Statistics Basic: Sampling Distribution
Inferential Statistics Generalizing from a sample to a population that involves determining how far ...
- [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing
Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...
- [Math Review] Linear Algebra for Singular Value Decomposition (SVD)
Matrix and Determinant Let C be an M × N matrix with real-valued entries, i.e. C={cij}mxn Determinan ...
- 统计处理包Statsmodels: statistics in python
http://blog.csdn.net/pipisorry/article/details/52227580 Statsmodels Statsmodels is a Python package ...
- FAQ: Automatic Statistics Collection (文档 ID 1233203.1)
In this Document Purpose Questions and Answers What kind of statistics do the Automated tasks ...
- Machine and Deep Learning with Python
Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstiti ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 本人AI知识体系导航 - AI menu
Relevant Readable Links Name Interesting topic Comment Edwin Chen 非参贝叶斯 徐亦达老板 Dirichlet Process 学习 ...
- [book]awesome-machine-learning books
https://github.com/josephmisiti/awesome-machine-learning/blob/master/books.md Machine-Learning / Dat ...
随机推荐
- 剑指Offer - 九度1523 - 从上往下打印二叉树
剑指Offer - 九度1523 - 从上往下打印二叉树2013-12-01 00:35 题目描述: 从上往下打印出二叉树的每个节点,同层节点从左至右打印. 输入: 输入可能包含多个测试样例,输入以E ...
- 《Cracking the Coding Interview》——第12章:测试——题目4
2014-04-25 00:35 题目:没有专门的测试工具,你要如何对一个网页进行压力测试? 解法:拼手速,拼电脑数量呗.快捷键+复制粘贴网址,狂搞一番.话说回来,有脚本语言的情况下,直接写个脚本来模 ...
- CSS系列(7)CSS类选择器Class详解
这一篇文章,以笔记形式写. 1, CSS 类选择器详解 http://www.w3school.com.cn/css/css_selector_class.asp 知识点: (1) 使用类选择 ...
- 常用模块(datatime)
import datetime,time# dt = datetime.datetime.now() # 获取当前时间的时间对象# dt = datetime.date.fromtimestamp(t ...
- 你的第一个自动化测试:Appium 自动化测试
前言: 这是让你掌握 App 自动化的文章 一.前期准备 本文版权归作者和博客园共有,原创作者:http://www.cnblogs.com/BenLam,未经作者同意必须在文章页面给出原文连接. 1 ...
- 编译TypeScript(TypeScript转JavaScript)
1.配置tsconfig.json文件 tsconfig.json文件配置说明 { "compilerOptions": { //生成相关说明,TypeScript编译器如何编译. ...
- 聊聊、Java SPI
SPI,Service Provider Interface,服务提供者接口. Animal 接口 package com.rockcode.www.spi; public interface Ani ...
- 【Android】Android中期项目设计题目-界面设计小作业-提交截止时间2016.4.8
评选三份作品,请发关于app运行界面截图的博客.
- 201621123034 《Java程序设计》第11周学习总结
作业11-多线程 1. 本周学习总结 1.1 以你喜欢的方式(思维导图或其他)归纳总结多线程相关内容. 2. 书面作业 本次PTA作业题集多线程 1. 源代码阅读:多线程程序BounceThread ...
- [CF954G]Castle Defense
题目大意:有$n$个点,每个点最开始有$a_i$个弓箭手,在第$i$个位置的弓箭手可以给$[i-r,i+r]$区间加上$1$的防御,你还有$k$个弓箭手,要求你最大化最小防御值 题解:二分答案,从右向 ...