https://www.statology.org/from-correlation-to-causation-deep-dive-into-data-interpretation/

From Correlation to Causation: Deep Dive into Data Interpretation

Correlation and Causation are key concepts in data analysis.

However, correlation doesn't mean causation.

For example:

  • "ice cream sales" and "sunburns" increase in the summer.
  • These events happen together but do not cause each other.

    They occur together because of a third factor: hot weather.
  • Illustration explains:

In this article, we will learn more about correlation and causation. We will also understand how these terms are different from each other.

Summary

  • Correlation means two things HAPPEN TOGETHER.
  • Causation means ONE thing MAKES ANOTHER thing HAPPEN.
  • Correlation doesn't necessarily imply causation.

    Just because two things happen together, doesn't mean that one causes the other.

    It is essential to understand this distinction for accurate data analysis.

Causation vs. Correlation:

"Correlation" does not mean one variable causes the other to change.

It only shows that they change together.

Influence of Outliers: Outliers can change the correlation a lot. They can make the relationship stronger or weaker than it really is. Special methods can help reduce the impact of outliers.

Why Correlation Doesn't Imply Causation

Correlation does not always mean causation.

Causation usually mean correlation.

The Venn diagram shows the overlap between them.

Let’s explore why spurious correlations happen:

  • Third Variables (Confounding Factors):

    Sometimes two things appear to have a connection, but in fact have one because of something that has an impact on both.

    For example, umbrella sales and ice cream sales might increase together because of hot weather, not because people eat ice cream when it rains.
  • Reverse Causation:

    Correlation doesn't show which way the causation goes.

    Two things might seem linked, but one could cause the other or the other way around.

    For instance, exercise and weight loss are correlated, but weight loss also makes people exercise more.
  • Random Chance (Coincidence):

    Sometimes things look connected just by luck.

    This can lead to wrong ideas about causation if we assume cause and effect based only on correlation.

Causation

Causation refers to a cause-and-effect relationship between variableS.

It means that changes in one variable cause changes in another variable.

Key Principles of Causation

  1. Temporal Precedence:

    The cause MUST happen before the effect.

    This helps us understand which one comes first and shows us the direction of cause and effect.
  2. Covariation:

    Changes in the cause SHOULD always be followed by changes in the effect.

    Methods like long-term studies can prove this connection over time.
  3. Controlling for Confounding Factors:

    It's important to find other factors that COULD affect the relationship between variables.

    Methods like statistical controls and random selection can reduce the impact of these factors.

Establishing Causation

To establish causation, researchers often use experimental designs.

They change the independent variable and observe changes in the dependent variable.

Key methods include:

  • Randomized Controlled Trials (RCTs):

    These experiments randomly assign participants to different groups.

    One variable is changed, and other variables constant are kept constant.
  • Longitudinal Studies:

    These studies track variables for a long period of time.

    It examines how changes in one variable(independent variable) relate to changes in another variable(dependent variable).
  • Challenges and Considerations
    • Ethical Constraints: Some experiments that establish causation may be unethical to conduct.
    • Complexity: It is difficult to establish causation due to the presence of multiple variables and interactions among them.

Correlation

Correlation is a statistical measure used to assess the relationship between two variables.

We can understand how changes in one variable are associated with changes in another.

Limitations of Correlation: "Linear Relationships Only"

  • Correlation measures only straight-line relationships.
  • It doesn't work for curved or irregular ones.

Types of Correlation

There are three main types of correlation:

  1. Positive Correlation: When both variables move in the same direction (e.g., as one variable increases, the other also increases).
  2. Negative Correlation: When variables move in opposite directions (e.g., as one variable increases, the other decreases).
  3. No Correlation: When there is no discernible relationship between the variables.

Measuring Correlation

Correlation is MEASURED USING a correlation coefficient, typically denoted as .

The value of ranges from -1 to 1.

  • =1: Perfect positive correlation;

    as one variable increases, the other also increases proportionally.
  • =−1: Perfect negative correlation;

    as one variable increases, the other decreases proportionally.
  • =0: No correlation;

    there is no linear relationship between the variables.

The relationship is stronger if is close to +1 or -1.

A correlation coefficient near 0 refers to a weak or no linear relationship.

Interpreting Correlation

Strength of Relationship:

The correlation coefficient ranges from -1 to 1.

  • A correlation coefficient close to +1 indicates a strong positive relationship.

    It means as one variable increases, the other also increases.
  • A coefficient close to -1 indicates a strong negative relationship.

    When one variable increases, the other tends to decrease.
  • A coefficient close to 0 suggests a weak or no relationship between the variables.

Direction of Relationship:

The sign of the correlation coefficient (+ or -) indicates the direction of the relationship.

**A positive r indicates a positive relationship (both variables move in the same direction).

A negative r indicates a negative relationship (both variables move in opposite directions).

SciTech-Mathematics-Probability+Statistics-Causation vs. Correlation: From Correlation to Causation: Deep Dive into Data Interpretation的更多相关文章

  1. linear correlation coefficient|Correlation and Causation|lurking variables

    4.4 Linear Correlation 若由SxxSyySxy定义则为: 所以为了计算方便: 所以,可以明白的是,Sxx和Sx是不一样的! 所以,t r is independent of th ...

  2. Probability&Statistics 概率论与数理统计(1)

    基本概念 样本空间: 随机试验E的所有可能结果组成的集合, 为E的样本空间, 记为S 随机事件: E的样本空间S的子集为E的随机事件, 简称事件, 由一个样本点组成的单点集, 称为基本事件 对立事件/ ...

  3. How do I learn mathematics for machine learning?

    https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning   How do I learn mathematics f ...

  4. 相关系数(CORRELATION COEFFICIENTS)会骗人?

    CORRELATION COEFFICIENTS We've discussed how to summarize a single variable. The next question is ho ...

  5. Chapter 07-Basic statistics(Part3 correlations)

    这一部分使用R基础已安装包中的state.x77数据集.该数据集的数据是关于美国50个州在1977年对人口,收入,文盲率,平均寿命,谋杀率,高中毕业率统计所得. 1.关联的种类(types of co ...

  6. Type Correlation

    Types of correlation: Logical correlation: Using pre-defined and customized correlation rules. Inven ...

  7. Understanding and Selecting a SIEM/LM: Correlation and Alerting

    Continuing our discussion of core SIEM and Log Management technology, we now move into event correla ...

  8. 转:如何在 LoadRunner 脚本中做关联 (Correlation)

    如何在 LoadRunner 脚本中做关联 (Correlation) 当录制脚本时,VuGen会拦截client端(浏览器)与server端(网站服务器)之间的对话,并且通通记录下来,产生脚本.在V ...

  9. [Math Review] Statistics Basics: Main Concepts in Hypothesis Testing

    Case Study The case study Physicians' Reactions sought to determine whether physicians spend less ti ...

  10. Machine Learning and Data Mining(机器学习与数据挖掘)

    Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...

随机推荐

  1. PACS千万家,好看耐用第一家---基于JAVA开发的跨平台PACS系统

    随着2011年成功上线全院级PACS,期间软件版本不断的更新和优化,也得到了不少HIS厂商及同行友商的支持,目前已有医院客户达到了300多家遍布全国各个省份,随着时间的推移,PACS老客户的数据量在不 ...

  2. 通过IP计算分析归属地

    在产品中可能存在不同客户端,请求同一个服务端接口的场景. 例如小程序和App或者浏览器中,如果需要对请求的归属地进行分析,前提是需要先获取请求所在的国家或城市,这种定位通常需要主动授权,而用户一般是不 ...

  3. git-fame实战操作

    参考网址:https://pydigger.com/pypi/git-fame,https://github.com/casperdcl/git-fame Git-fame 简介: Pretty-pr ...

  4. 用 Tarjan 算法求解有向图的强连通分量

    图论中的连通性概念是许多算法与应用的基础.当我们研究网络结构.依赖关系或路径问题时,理解图中的连通性质至关重要.对于不同类型的图,连通性有着不同的表现形式和算法解决方案. 无向图与有向图的连通性 在无 ...

  5. CTP报单业务介绍

    程序化登录信息 客户如果需要调用API介入柜台进行程序化交易,登录时需要一些基本信息,如下: 1.BrokerID 简称期货编码,是在该期货公司在CTP系统上的编码,为四位数,例如海通期货是8000 ...

  6. XSSpecter - Blind XSS 检测与管理工具

    XSSpecter 是一个模块化的盲测跨站脚本(XSS)漏洞管理工具包,包含服务端回调处理和客户端自动化测试工具. 项目概述 XSSpecter 提供两大核心组件: 服务端 - 处理XSS回调.数据持 ...

  7. 自定义Marix的自定义动画,支持缓动动画属性

    最近用画布的MatrixTransForm做变换,需要用Matrix做动画处理,网上冲浪找了一圈,没有找出好的解决方法 Stack Overflow 给出了一部分的解决方法,但是不支持缓动函数,貌似不 ...

  8. Linux命令之Telnet的使用方法

    无论是linux还是windows,在命令行下,telnet命令都可以用于查看某个远端主机端口或者服务域名是否可以访问,语法糖如下: telnet IP 端口 telnet 域名 端口(即:telne ...

  9. 把多个文件打包压缩成tar.gz文件并解压的Java实现

    压缩文件   在Java中,可以 使用GZIPOutputStream创建gzip(gz)压缩文件,它在commons-compress下面,可以通过如下的maven坐标引入: <depende ...

  10. java里面的高精度运算

    1 package com.lv.study.am.first; 2 3 import java.math.BigDecimal; 4 5 public class TestBigDecimal { ...