Three ways to detect outliers
Z-score
import numpy as np def outliers_z_score(ys):
threshold = 3 mean_y = np.mean(ys)
stdev_y = np.std(ys)
z_scores = [(y - mean_y) / stdev_y for y in ys]
return np.where(np.abs(z_scores) > threshold)
Modified Z-score
import numpy as np def outliers_modified_z_score(ys):
threshold = 3.5 median_y = np.median(ys)
median_absolute_deviation_y = np.median([np.abs(y - median_y) for y in ys])
modified_z_scores = [0.6745 * (y - median_y) / median_absolute_deviation_y
for y in ys]
return np.where(np.abs(modified_z_scores) > threshold)
IQR(interquartile range)
import numpy as np def outliers_iqr(ys):
quartile_1, quartile_3 = np.percentile(ys, [25, 75])
iqr = quartile_3 - quartile_1
lower_bound = quartile_1 - (iqr * 1.5)
upper_bound = quartile_3 + (iqr * 1.5)
return np.where((ys > upper_bound) | (ys < lower_bound))
Conclusion
It is important to reiterate that these methods should not be used mechanically.
They should be used to explore the data – they let you know which points might be worth a closer look.
What to do with this information depends heavily on the situation.
Sometimes it is appropriate to exclude outliers from a dataset to make a model trained on that dataset more predictive.
Sometimes, however,
the presence of outliers is a warning sign that the real-world process generating the data is more complicated than expected. As an astute commenter on CrossValidated put it:
“Sometimes outliers are bad data, and should be excluded, such as typos.
Sometimes they are Wayne Gretzky or Michael Jordan, and should be kept.” Domain knowledge and practical wisdom are the only ways to tell the difference.
摘自:http://colingorrie.github.io/outlier-detection.html
Three ways to detect outliers的更多相关文章
- MATLAB remove outliers.
Answer by Richard Willey on 9 Jan 2012 Hi Michael MATLAB doesn't provide a specific function to remo ...
- 异常值处理outlier
python信用评分卡(附代码,博主录制) https://study.163.com/course/introduction.htm?courseId=1005214003&utm_camp ...
- 深度学习图像配准 Image Registration: From SIFT to Deep Learning
Image Registration is a fundamental step in Computer Vision. In this article, we present OpenCV feat ...
- (转)利用libcurl和国内著名的两个物联网云端通讯的例程, ubuntu和openwrt下调试成功(四)
1. libcurl 的参考文档如下 CURLOPT_HEADERFUNCTION Pass a pointer to a function that matches the following pr ...
- 如何查找物理cpu,cpu核心和逻辑cpu的数量
环境 Red Hat Enterprise Linux 4 Red Hat Enterprise Linux 5 Red Hat Enterprise Linux 6 Red Hat Enterpri ...
- Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来
Gartner 2018 年WAF魔力象限报告:云WAF持续增长,Bot管理与API安全拥有未来 来源 https://www.freebuf.com/articles/paper/184903.ht ...
- A Gentle Guide to Machine Learning
A Gentle Guide to Machine Learning Machine Learning is a subfield within Artificial Intelligence tha ...
- [linux]linuxI/O测试的方法之dd
参考http://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd Measuring Write Performan ...
- WGCNA构建基因共表达网络详细教程
这篇文章更多的是对于混乱的中文资源的梳理,并补充了一些没有提到的重要参数,希望大家不会踩坑. 1. 简介 1.1 背景 WGCNA(weighted gene co-expression networ ...
随机推荐
- SSRS----关于图表参考线(平均线)的添加
在开发报表的时候,遇到了一个问题,客户需要在气泡图上添加水平和竖直两条平均线(结果参考如下图). 个人知识背景 一般添加参考线本身是有一个相关的设置的,但一般都是相对于Y值,即平行于X轴的.用类似的方 ...
- HBase Rowkey 设计指南
为什么Rowkey这么重要 RowKey 到底是什么 我们常说看一张 HBase 表设计的好不好,就看它的 RowKey 设计的好不好.可见 RowKey 在 HBase 中的地位.那么 RowKey ...
- APACHE SPARK 2.0 API IMPROVEMENTS: RDD, DATAFRAME, DATASET AND SQL
What’s New, What’s Changed and How to get Started. Are you ready for Apache Spark 2.0? If you are ju ...
- springboot aop + logback + 统一异常处理 打印日志
1.src/resources路径下新建logback.xml 控制台彩色日志打印 info日志和异常日志分不同文件存储 每天自动生成日志 结合myibatis方便日志打印(debug模式) < ...
- 【vue】vue全家桶
vue-router(http://router.vuejs.org) vuex(https://vuex.vuejs.org/zh/guide/) vue-resource(https://gith ...
- Eclipse中的快捷键
Ctrl+1:快捷修复(数字 1 不是字母 l) 将鼠标悬停到出错区域,按 Ctrl+1,出现快捷修复的菜单, 按上下方向键选择一种修复方式即可. 也可以将光标移动到出错区域,按 F2 + Enter ...
- android 图片上传图片 报Socket: Broken pipe
上传图片的时候报如下错误: 上传失败的原因是服务器限制了文件上传的大小.让服务端改一下配置文件就好了
- L1-8 矩阵A乘以B (15 分)
给定两个矩阵A和B,要求你计算它们的乘积矩阵AB.需要注意的是,只有规模匹配的矩阵才可以相乘.即若A有Ra行.Ca列,B有Rb行.Cb列,则只有Ca与Rb相等时,两 ...
- AOP从静态代理到动态代理 Emit实现
[前言] AOP为Aspect Oriented Programming的缩写,意思是面向切面编程的技术. 何为切面? 一个和业务没有任何耦合相关的代码段,诸如:调用日志,发送邮件,甚至路由分发.一切 ...
- 如何给框架添加API接口日志
前言 用的公司的框架,是MVC框架,看了下里面的日志基类,是操作日志,对增删改进行记录, 夸张的是一张业务的数据表 需要一张专门的日志表进行记录, 就是说你写个更新,添加的方法都必须写一遍操作日志,代 ...