sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

医药统计项目联系QQ:231469242

P值:观察到极端值的概率

观察到的概率越低,结果就越显著。观察到概率低于P值时,认为足够证据支持H1(显著)

类似于反证法,先假设H0,A和B没有关系

观察到结果概率非常低,几乎不可能发生,推翻原假设H0

H1成立(有显著关系)

显著性不能证明任何事情是真,而只能拒绝两者没有关系(证明两者有显著关系)

显著性不能量化差异性

显著性不能夸大两者差异性有现实意义

显著性不能解释为什么两者有差异性

P值小于0.05解释(The Interpretation of the p-Value)P值小于0.05:如果H0是真,找到极端值的概率小于5%。不能简单说明H0是假或H1是真
A value of p < 0:05 for the null hypothesis has to be interpreted as follows: If
the null hypothesis is true, the chance to find a test statistic as extreme as or more
extreme than the one observed is less than 5%. This is not the same as saying that
the null hypothesis is false, and even less so, that an alternative hypothesis is true!

http://www.360doc.com/content/15/0704/22/22175932_482657194.shtml

P值误区

Pitfalls in the Interpretation of p-Values
In other words, p-values measure evidence for a hypothesis. Unfortunately, they are
often incorrectly viewed as an error probability for rejection of the hypothesis, or,
even worse, as the posterior probability (i.e., after the data have been collected) that
the hypothesis is true. As an example, take the case where the alternative hypothesis
is that the mean is just a fraction of one standard deviation larger than the mean
under the null hypothesis: in that case, a sample that produces a p-value of 0.05 may
just as likely be produced if the alternative hypothesis is true as if the null hypothesis
is true!
P值用于测量H0假设的证据。不幸的事,P值常备误解。例如被误解为拒绝H0 的错误概率或H0成立的概率。

When is a number so much bigger or smaller than another that it should raise some eyebrows? Tests of significance help make such determinations. This lesson explains the p-value in significance tests, how to calculate them, and how to evaluate the results.

Tests of Significance

Imagine that you want to be the new point guard of your basketball team, but before you try out for the position, you want to make sure you have, pun intended, a real shot at achieving your goal. You shoot 20 free throws and make 12 of them; that's a 60% accuracy rate. You want to know if your accuracy rate, or the observation, is about the same or different than the team's accuracy rate, or the population statistic; enough to replace the old point guard.

You can do a test of significance to ascertain if your accuracy rate is significantly different from that of the team. A significance test measures whether some observed value is similar to the population statistic, or if the difference between them is large enough that it isn't likely to be by coincidence. When the difference between what is observed and what is expected surpasses some critical value, we say there is statistical significance.

 

Preparing for a test?

Try a practice test for free!

 
 

P-Value Defined

A standard normal distribution curve represents all of the observations of a single random variable such that the highest point under the curve is where you would expect to find values closest to the mean and values least likely to be observed in the smallest part under the curve.

The p-value is the probability of finding an observed value or a data point relative to all other possible results for the same variable. If the observed value is a value most likely to be found among all possible results, then there is not a statistically significant difference. If, on the other hand, the observed value is a value among unlikely values to be found, then there is a statistically significant difference. The smaller the probability associated with the observed value, the more likely the result is to be significant.

Finding The P-Value

To find the p-value, or the probability associated with a specific observation, you must first calculate the z score, also known as the test statistic.

The formula for finding the test statistic depends on whether the data includes means or proportions. The formulas we'll discuss assume a:

  1. Single sample significance test
  2. Normal distribution
  3. Large sample size.

When dealing with means, the z score is a function of the observed value (x-bar), population mean (mu), standard deviation (s), and the number of the observations (n).

When dealing with proportions, the z score is a function of the observed value (p-hat), proportion observed in the population (p), probability of successful outcome (p), probability of failure (q = 1 - p), and the number of trials (n).

After calculating the z score, you must look up the probability associated with that score on a Standard Normal Probabilities Table. This probability is the p-value or the probability of finding the observed value compared to all possible results. The p-value is then compared to the critical value to determine statistical significance.

The Critical Value

The critical value, or significance level, is established as part of the study design and is denoted by the Greek letter alpha. If we choose an alpha = 0.05, we are requiring an observed data point be so different from what is expected that it would not be observed more than 5% of the time. An alpha equaling 0.01 would be even more strict. In this case, a statistically significant test statistic beyond this critical value has less than a 1 in 100 probability of occurring by chance.

The last step in a significance test is to compare the p-value to alpha to determine statistical significance. If the p-value exceeds the critical value, then we can reject the idea that the observed value was a result found by chance.

What Significance Tells Us

So, let's say your free throw accuracy of 0.6 turns out to have a z score associated with a probability of 0.03 and your alpha is set at 0.05, or p < alpha, then there is a statistically significant difference. We can reject the idea that there is no difference between your accuracy and the accuracy of the team's, and accept the alternative: your shooting accuracy is significantly different from that of the team's.

If, on the other hand, the alpha is set at 0.01, then p > alpha and the result is not statistically significant. In this case, your coaches can say, 'Um, sorry. There simply isn't enough evidence to conclude you are way, way better.'

Significance does not:

  • Prove anything is true; it can only disprove that there is no difference
  • Quantify the difference between your accuracy and the team's
  • Magnify how meaningful the difference is between your accuracy and the team's
  • Explain why there was any difference found between your accuracy and the team's
  • Ensure you will be made the new point guard

Lesson Summary

A significance test measures whether some observed value is similar to the population statistic or if the difference between the observed value and the population statistic is large enough that it isn't likely to be a coincidence.

The p-value is the probability of finding an observed value or data point relative to all other possible results for the same variable. To find the p-value, you must first calculate the z score, also known as the test statistic. After calculating the z score, look up the probability associated with that score on a Standard Normal Probabilities Table. The last step in a significance test is to compare the p-value to an established critical value, called alpha, to determine statistical significance.

If the p-value is a value most likely to be found among all possible results, then there is not a statistically significant difference. If, on the other hand, the observed value is a value among unlikely values to be found, then there is a statistically significant difference.

P值解释和误区的更多相关文章

  1. 一段小代码秒懂C++右值引用和RVO(返回值优化)的误区

    关于C++右值引用的参考文档里面有明确提到,右值引用可以延长临时变量的周期.如: std::string&& r3 = s1 + s1; // okay: rvalue referen ...

  2. Flex 布局的各属性取值解释

    Flex布局是一种弹性布局.布局样式比较灵活,大多数情况下可以替代float,而且不会脱离文档里流. Flex中定义了两个轴线,一个主轴一个副轴,这个概念你可以想想屏幕坐标系(X轴向右,Y轴向下),F ...

  3. mybatis insert、update 、delete默认返回值解释与如何设置返回表主键

    在使用mybatis做持久层时,insert.update.delete,sql语句默认是不返回被操作记录主键的,而是返回被操作记录条数: 那么如果想要得到被操作记录的主键,可以通过下面的配置方式获取 ...

  4. 安卓推送——个推服务端api使用误区

    首先你需要在个推开放着平台上注册你的应用,以及获得以下几个必要的值APPID |APPKEY | MASTERSECRET,本文假设你已经完成上述步骤以及完成客户端SDK的集成. 原理 个推服务端ap ...

  5. ES6--闭包数组i的值与var的作用域理解

    var a = [];for (var i = 0; i < 10; i++) { a[i] = function () { console.log(i); };}a[6](); // 10 变 ...

  6. Oracle 数据库字典 sys.obj$ 表中关于type#的解释

    sys.obj$ 表是oracle 数据库字典表中的对象基础表,所有对象都在该表中有记录,其中type#字段表明对象类型,比如有一个表 test ,则该对象在sys.obj$ 中存在一条记录,name ...

  7. 一种根据value解释成枚举的有效方法

    有时候需要根据实际情况将某个值解释成一个枚举,而不是根据枚举的名称.如有时0是枚举Gender.MALE的值,1是枚举Gender.FEMALE的值:有时0.1又可以是另一个枚举的值,比如0表示Cer ...

  8. java 学习第二篇关系运算符和布尔值

    关系运算符,顾名思义.用来看什么关系.(也就是用来比较) 看下表 JAVA 关系运算符 a=6,b=5 关系运算符 举例 值 解释 > a>b true a大于b < a<b ...

  9. .NET Core程序中,如何获取和设置操作系统环境变量的值

    有时候我们在.NET Core程序中需要获取和设置操作系统环境变量的值.本文演示如何使用Environment.GetEnvironmentVariable和Environment.SetEnviro ...

随机推荐

  1. Qt应用程序重启

    重启应用程序是一种常见的操作,在Qt中实现非常简单,需要用到QProcess类一个静态方法: // program, 要启动的程序名称 // arguments, 启动参数 bool startDet ...

  2. 所见即所得:七大无需编程的DIY开发工具

    现如今,各种DIY开发工具不断的出现,使得企业和个人在短短几分钟内就能完成应用的创建和发布,大大节省了在时间和资金上的投入.此外,DIY工具的出现,也帮助广大不具备专业知识和技术的“移动开发粉”创建自 ...

  3. git blame 查看某行代码提交记录

    1. 在当前git项目目录下执行 git blame -L 38,38 <filename> 例子:  git blame -L 38,38 src/component/BarCode/i ...

  4. __autoload 与spl_autoload_register()

    PHP __autoload函数(自动载入类文件)的使用方法 作者: 字体:[增加 减小] 类型:转载 时间:2012-02-04   在使用PHP的OO模式开发系统时,通常大家习惯上将每个类的实现都 ...

  5. 【IdentityServer4文档】- 使用客户端凭据保护 API

    使用客户端凭据保护 API quickstart 介绍了使用 IdentityServer 保护 API 的最基本场景. 接下来的场景,我们将定义一个 API 和一个想要访问它的客户端. 客户端将在 ...

  6. 0512 SCRUM团队项目3.0

    题目 SCRUM 流程的步骤2: Spring 计划 1. 确保product backlog井然有序.(参考示例图1) 2. Sprint周期,一个冲刺周期,长度定为两周,本学期还有三个冲刺周期. ...

  7. C#高级编程 (第六版) 学习 第六章:运算符和类型强制转换

    第六章 运算符和类型强制转换 1,运算符 类别 运算符 算术运算符 + - * / % 逻辑运算符 & | ^ ~ && || ! 字符串连接运算符 + 增量和减量运算符 ++ ...

  8. 【week9】psp

    本周psp 项目 内容 开始时间 结束时间 中断时间 净时间 2016/11/14 看论文 蛋白质甲基化位点预测 9:30 13:00 15 195 讨论班 组内讨论班 13:30 17:00 0 2 ...

  9. PHP上传文件限制的大小

    修改PHP上传文件大小限制的方法 1. 一般的文件上传,除非文件很小.就像一个5M的文件,很可能要超过一分钟才能上传完.但在php中,默认的该页最久执行时间为 30 秒.就是说超过30秒,该脚本就停止 ...

  10. 51nod 1674 区间的价值V2(思维+拆位+尺取法)

    最近被四区题暴虐... 题意:lyk拥有一个区间. 它规定一个区间的价值为这个区间中所有数and起来的值与这个区间所有数or起来的值的乘积. 例如3个数2,3,6.它们and起来的值为2,or起来的值 ...