Difference: Population vs. Sample

BY ZACH BOBBITTPOSTED ON NOVEMBER 27, 2020







Often in statistics we're interested in collecting data so that we can answer some research question.

For example, we might want to answer the following questions:

  1. What is the median household income in Miami, Florida?
  2. What is the mean weight of a certain population of turtles?
  3. What percentage of residents in a certain county support a certain law?

In each scenario, we are interested in answering some question about a population, which represents every possible individual element that we're interested in measuring.

However, instead of collecting data on every individual in a population we instead collect data on a sample of the population, which represents a portion of the population.

Population: Every possible individual element that we are interested in measuring.

Sample: A portion of the population.

Here is an example of a population vs. a sample in the three intro examples.

Three Examples

  1. What is the median household income in Miami, Florida?

    The entire population might include 500,000 households,

    but we might only collect data on a sample of 2,000 total households.
  2. What is the mean weight of a certain population of turtles?

    The entire population might include 800 turtles,

    but we might only collect data on a sample of 30 turtles.
  3. What percentage of residents in a certain county support a certain law?

    The entire population might include 50,000 residents,

    but we might only collect data on a sample of 1,000 residents.

Why Use Samples?

There are several reasons that we typically collect data on samples instead of entire populations, including:

  1. It is too time-consuming to collect data on an entire population. For example, if we want to know the median household income in Miami, Florida, it might take months or even years to go around and gather income for each household. By the time we collect all of this data, the population may have changed or the research question of interest might no longer be of interest.
  2. It is too costly to collect data on an entire population. It is often too expensive to go around and collect data for every individual in a population, which is why we instead choose to collect data on a sample instead.
  3. It is unfeasible to collect data on an entire population. In many cases it's simply not possible to collect data for every individual in a population. For example, it may be extraordinarily difficult to track down and weigh every turtle in a certain population that we're interested in.

By collecting data on samples, we're able to gather information about a given population much faster and cheaper.

And if our sample is representative of the population, then we can generalize the findings from a sample to the larger population with a high level of confidence.

The Importance of Representative Samples

When we collect a sample from a population,

we ideally want the sample to be like a "mini version" of our population.

For example, suppose we want to understand the movie preferences of students in a certain school district that has a population of 5,000 total students. Since it would take too long to survey every individual student, we might instead take a sample of 100 students and ask them about their preferences.

If the overall student population is composed of 50% girls and 50% boys, our sample would not be representative if it included 90% boys and only 10% girls.

Or if the overall population is composed of equal parts freshman, sophomores, juniors, and seniors, then our sample would not be representative if it only included freshman.

A sample is representative of a population if the characteristics of the individuals in the sample \(\large closely\ matches\) the characteristics of the individuals in the overall population.

When this occurs, we can generalize the findings from the sample to the overall population with confidence.

How to Obtain Samples

There are many different methods we can use to obtain samples from populations.

To maximize the chances that we obtain a representative sample, we can use one of the three following methods:

  • Simple random sampling:Randomly select $\large individuals $ through the use of \(\large a\ random\ number\ generator\) or \(\large some\ means\ of\ random\ selection\).

  • Stratified random sampling: Split \(\large a\ population\) into \(\large groups\). Randomly select some \(\large members\) from \(\large each\ group\) to be in the sample.

  • Systematic random sampling: Put every member of a population into some order. Choose a random starting point and select every \(\large n\)th member to be in the sample.

In each of these methods, every individual in the population has an equal probability of being included in the sample. This maximizes the chances that we obtain a sample that is a “mini version” of the population.

SciTech-Mathmatics-Probability+Statistics-Population Vs. Sampling: Representative Samples + How to obtain Samples的更多相关文章

  1. Simple Random Sampling|representative sample|probability sampling|simple random sampling with replacement| simple random sampling without replacement|Random-Number Tables

    1.2 Simple Random Sampling Census, :全部信息 Sampling: 抽样方式: representative sample:有偏向,研究者选择自己觉得有代表性的sam ...

  2. Probability&Statistics 概率论与数理统计(1)

    基本概念 样本空间: 随机试验E的所有可能结果组成的集合, 为E的样本空间, 记为S 随机事件: E的样本空间S的子集为E的随机事件, 简称事件, 由一个样本点组成的单点集, 称为基本事件 对立事件/ ...

  3. 随机采样和随机模拟:吉布斯采样Gibbs Sampling

    http://blog.csdn.net/pipisorry/article/details/51373090 吉布斯采样算法详解 为什么要用吉布斯采样 通俗解释一下什么是sampling. samp ...

  4. [Math Review] Statistics Basic: Sampling Distribution

    Inferential Statistics Generalizing from a sample to a population that involves determining how far ...

  5. Sampling Error|Sampling mean|population mean

    7.1 Sampling Error; the Need for Sampling Distributions 样本均值的三种表达: Sampling distribution of the samp ...

  6. Gibbs sampling

    In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is aMarkov chain Monte C ...

  7. Java 7 jstat – JVM Statistics Monitoring Tool【翻译】

    原文地址:Java 7 jstat 本文内容 语法 参数 描述 虚拟机标识符 选项 一般选项 输出选项 示例 先发出来,然后慢慢翻译~ 语法 jstat [ generalOption | outpu ...

  8. 【算法34】蓄水池抽样算法 (Reservoir Sampling Algorithm)

    蓄水池抽样算法简介 蓄水池抽样算法随机算法的一种,用来从 N 个样本中随机选择 K 个样本,其中 N 非常大(以至于 N 个样本不能同时放入内存)或者 N 是一个未知数.其时间复杂度为 O(N),包含 ...

  9. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  10. How to handle Imbalanced Classification Problems in machine learning?

    How to handle Imbalanced Classification Problems in machine learning? from:https://www.analyticsvidh ...

随机推荐

  1. CTF实验吧认真一点 SQL盲注

    实验吧地址 http://ctf5.shiyanbar.com/web/earnest/index.php 很明显的返回两个不同得页面,判断为SQL盲注 并且 过滤了敏感字符 测试的时候还发现过滤了s ...

  2. windows实现每天定时截图

    windows实现每天定时截图 一直想要做一个功能来给自己的电脑每天进行一个截图操作,今天终于做好了,下面分享一下设置的过程. 使用工具 任务计划程序(系统自带) snipaste (手动下载) 设置 ...

  3. 工具 | Hfish

    0x00 简介 HFish是一款社区型免费蜜罐. 下载地址 HFish下载: HFish下载 0x01 功能说明 支持多种蜜罐服务 支持自定义Web蜜罐 支持流量牵引 支持端口扫描感知能力 支持多种告 ...

  4. 使用JAVA对接Deepseek API实现首次访问和提问

    一.标题 使用JAVA对接Deepseek API实现首次访问和 提问:我有50万能做什么小本生意,举例3个! 二.代码 import java.io.BufferedReader; import j ...

  5. C++11——右值引用&完美转发

    总而言之,右值引用,完美转发,std::move()都是为了在程序运行过程中,避免变量多次重复的申请和释放内存空间,使用移动语义将申请的空间通过这几种方式进行循环使用,避免重新开辟新空间和拷贝浪费算力 ...

  6. P11071 「QMSOI R1」 Distorted Fate题解

    题意: 给定一个序列,给定两种操作: 将一个区间异或上一个给定的值. 给定 \(l,r\) 求 \[{\large (\sum_{i=l}^r\bigcup_{j=l}^i A_j) \bmod 2^ ...

  7. k8s之数据存储

    查看k8s支持的存储类 kubectl explain pods.spec.volumes 1.emptydir kubectl explain pods.spec.volumes.emptyDir ...

  8. vue3 基础-API-watch 和 watchEffect

    前篇对 computed 属性如何在 api 中基本使用, 即从 vue 中引入, 然后通过直接传函数或者传对象的方式, 开箱即用, 非常清晰易懂. 本篇继续来对之前的 watch 进行扩展使用啦. ...

  9. 全网资源无水印下载!支持抖音、视频号、小红书等,Rubik下载介绍

    在日常生活和工作中,我们经常要用到一些优质的影音或图片素材,然而,随着各种平台的限制越来越多,不是需要付费订阅后才能下载,就是完全不提供下载渠道,想要找到一个广泛又好用的下载工具变得格外困难 Rubi ...

  10. BootStrap CDN收藏,矢量图标

    <!-- 新 Bootstrap 核心 CSS 文件 --> <link href="https://cdn.staticfile.org/twitter-bootstra ...