Difference: Population vs. Sample

BY ZACH BOBBITTPOSTED ON NOVEMBER 27, 2020

Often in statistics we're interested in collecting data so that we can answer some research question.

For example, we might want to answer the following questions:

What is the median household income in Miami, Florida?
What is the mean weight of a certain population of turtles?
What percentage of residents in a certain county support a certain law?

In each scenario, we are interested in answering some question about a population, which represents every possible individual element that we're interested in measuring.

However, instead of collecting data on every individual in a population we instead collect data on a sample of the population, which represents a portion of the population.

Population: Every possible individual element that we are interested in measuring.

Sample: A portion of the population.

Here is an example of a population vs. a sample in the three intro examples.

Three Examples

What is the median household income in Miami, Florida?

The entire population might include 500,000 households,

but we might only collect data on a sample of 2,000 total households.
What is the mean weight of a certain population of turtles?

The entire population might include 800 turtles,

but we might only collect data on a sample of 30 turtles.
What percentage of residents in a certain county support a certain law?

The entire population might include 50,000 residents,

but we might only collect data on a sample of 1,000 residents.

Why Use Samples?

There are several reasons that we typically collect data on samples instead of entire populations, including:

It is too time-consuming to collect data on an entire population. For example, if we want to know the median household income in Miami, Florida, it might take months or even years to go around and gather income for each household. By the time we collect all of this data, the population may have changed or the research question of interest might no longer be of interest.
It is too costly to collect data on an entire population. It is often too expensive to go around and collect data for every individual in a population, which is why we instead choose to collect data on a sample instead.
It is unfeasible to collect data on an entire population. In many cases it's simply not possible to collect data for every individual in a population. For example, it may be extraordinarily difficult to track down and weigh every turtle in a certain population that we're interested in.

By collecting data on samples, we're able to gather information about a given population much faster and cheaper.

And if our sample is representative of the population, then we can generalize the findings from a sample to the larger population with a high level of confidence.

The Importance of Representative Samples

When we collect a sample from a population,

we ideally want the sample to be like a "mini version" of our population.

For example, suppose we want to understand the movie preferences of students in a certain school district that has a population of 5,000 total students. Since it would take too long to survey every individual student, we might instead take a sample of 100 students and ask them about their preferences.

If the overall student population is composed of 50% girls and 50% boys, our sample would not be representative if it included 90% boys and only 10% girls.

Or if the overall population is composed of equal parts freshman, sophomores, juniors, and seniors, then our sample would not be representative if it only included freshman.

A sample is representative of a population if the characteristics of the individuals in the sample $\large closely\ matches$ the characteristics of the individuals in the overall population.

When this occurs, we can generalize the findings from the sample to the overall population with confidence.

How to Obtain Samples

There are many different methods we can use to obtain samples from populations.

To maximize the chances that we obtain a representative sample, we can use one of the three following methods:

Simple random sampling:Randomly select $\large individuals $ through the use of $\large a\ random\ number\ generator$ or $\large some\ means\ of\ random\ selection$.
Stratified random sampling: Split $\large a\ population$ into $\large groups$. Randomly select some $\large members$ from $\large each\ group$ to be in the sample.
Systematic random sampling: Put every member of a population into some order. Choose a random starting point and select every $\large n$th member to be in the sample.

In each of these methods, every individual in the population has an equal probability of being included in the sample. This maximizes the chances that we obtain a sample that is a “mini version” of the population.

SciTech-Mathmatics-Probability+Statistics-Population Vs. Sampling: Representative Samples + How to obtain Samples的更多相关文章

Simple Random Sampling|representative sample|probability sampling|simple random sampling with replacement| simple random sampling without replacement|Random-Number Tables
1.2 Simple Random Sampling Census, :全部信息 Sampling: 抽样方式: representative sample:有偏向,研究者选择自己觉得有代表性的sam ...
Probability&Statistics 概率论与数理统计(1)
基本概念样本空间: 随机试验E的所有可能结果组成的集合, 为E的样本空间, 记为S 随机事件: E的样本空间S的子集为E的随机事件, 简称事件, 由一个样本点组成的单点集, 称为基本事件对立事件/ ...
随机采样和随机模拟：吉布斯采样Gibbs Sampling
http://blog.csdn.net/pipisorry/article/details/51373090 吉布斯采样算法详解为什么要用吉布斯采样通俗解释一下什么是sampling. samp ...
[Math Review] Statistics Basic: Sampling Distribution
Inferential Statistics Generalizing from a sample to a population that involves determining how far ...
Sampling Error|Sampling mean|population mean
7.1 Sampling Error; the Need for Sampling Distributions 样本均值的三种表达: Sampling distribution of the samp ...
Gibbs sampling
In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is aMarkov chain Monte C ...
Java 7 jstat – JVM Statistics Monitoring Tool【翻译】
原文地址:Java 7 jstat 本文内容语法参数描述虚拟机标识符选项一般选项输出选项示例先发出来,然后慢慢翻译~ 语法 jstat [ generalOption | outpu ...
【算法34】蓄水池抽样算法 (Reservoir Sampling Algorithm)
蓄水池抽样算法简介蓄水池抽样算法随机算法的一种,用来从 N 个样本中随机选择 K 个样本,其中 N 非常大(以至于 N 个样本不能同时放入内存)或者 N 是一个未知数.其时间复杂度为 O(N),包含 ...
How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
How to handle Imbalanced Classification Problems in machine learning?
How to handle Imbalanced Classification Problems in machine learning? from:https://www.analyticsvidh ...

随机推荐

Maven依赖冲突解决方案：调解规则与工具实践
结论先行 Maven解决依赖冲突的核心机制是依赖调解和显式排除 ,并通过插件(如maven-dependency-plugin.maven-enforcer-plugin和Maven Helpe ...
需要的效果它都有，让AI对话开发效率翻倍！这款Ant Design扩展组件库绝了
嗨,大家好,我是小华同学,关注我们获得"最新.最全.最优质"开源项目和高效工作学习方法 ant-design-x-vue 是基于 Ant Design Vue 的扩展组件库,专注于 ...
RabbitMQ队列和消息的优先级
RabbitMQ队列和消息的优先级如果队列中的消息很多,需要一部分消息被优先消费,这是可以通过为消息和队列设置优先级来实现. 请注意,消息的优先级是相对于队列的优先级而言的.如果队列的最大优先级是1 ...
第一章 Kafka 配置部署及SASL_PLAINTEXT安全认证
系列文章目录第一章 Kafka 配置部署及SASL_PLAINTEXT安全认证第二章 Spring Boot 整合 Kafka消息队列生产者第三章 Spring Boot 整合 Kafka ...
反悔贪心&局部调整法学习笔记
一.什么是反悔贪心反悔贪心就是在普通贪心的过程中"反悔",从而使得一些看似不太好贪心的题变成贪心可做题. 二.反悔贪心普遍流程就是先使用一个好想的贪心策略,使用优先队列进行维护 ...
Flutter图片组件的定制开发与配置实践
@charset "UTF-8"; .markdown-body { line-height: 1.75; font-weight: 400; font-size: 15px; o ...
【转载】coroutine 与 goroutine 区别
如下原文转载自C语言中文网 C#.Lua.Python 语言都支持 coroutine 特性.coroutine 与 goroutine 在名字上类似,都可以将函数或者语句在独立的环境中运行,但是它们 ...
若依ruoyi项目学习（一）项目跑起来！
开个坑,记录自己学习若依的心得,感兴趣的小伙伴可以关注一波. 因为自己也比较菜,可能能为大家提供一个较低的视角去分析,希望大家能一起学习. 当然,即时视角很低,也不适合0基础的朋友~ 项目地址: 前置 ...
第二次阶段性OOP题目集总结性Blog
前言: 基础题目训练说明第一次基础题目有两道,题量较少,通过对之前题目的进一步扩展,考察知识点主要是1.类的封装.继承.多态2.抽象类3.接口.题目主要考查了学生对代码结构和可扩展性优化的能力.难度 ...
8086汇编(16位汇编)学习笔记01.汇编基础和debug使用
原文链接: https://bpsend.net/thread-100-1-2.html 为什么学习16位汇编? 16位操作指令最多能够操作两个字节,且更能够体现出与硬件的交互.16位下的指令和32位 ...

SciTech-Mathmatics-Probability+Statistics-Population Vs. Sampling: Representative Samples + How to obtain Samples